Running Stable Diffusion in 260MB of RAM

speedgoose · on July 20, 2023

I like the use of a tiny device to generate the images. I was wondering whether the energy consumption per image would be lower, but I did the simple maths and it's not the case.

A raspberry pi zero 2W seems to use about 6W under load (source: https://www.cnx-software.com/2021/12/09/raspberry-pi-zero-2-... )

So if it takes 3 hours to generate one picture, that's about 18Wh per image.

A Nvidia Tesla or RTX GPU can generate a similar picture very quickly. Assuming one second per image and 350W under load for the whole system it's in the magnitude of 0.1Wh per image.

Of course we could consider that a raspberry pi zero uses a lot less ressources and energy to be manufactured and transported.

xtagon · on July 21, 2023

Would an accelerator such as the Intel Neural Compute Stick 2 work with this? It can be plugged into a Pi, however I'm not clear on how VRAM works on the compute stick or if it's shared with the host?

hadlock · on July 20, 2023

For on prem use, the up front cost is a lot lower. The A100 that most serious outfits are using runs in the thousands to tens of thousands of dollars per unit with very limited availability. The pi is typically under $75 usd for any variant.

speedgoose · on July 20, 2023

A RTX 4090 has a much better value for stable diffusion but yes if you start to think about cost the pi wins. If you think about availability, I’m not sure.

hadlock · on July 20, 2023

The big immediate plus here, is if you live somewhere with limited access to the internet, you can still generate imagery offline on a low end laptop, like a protest group in far eastern europe or other areas. My personal travel laptop only has 8GB memory so it's exciting to be able to try out an idea even if I don't have high end hardware.

PeterStuer · on July 21, 2023

An RTX 3090 hits the current sweetspot of price/performance for me. Half the throughput of the 4090, but at 1/3rd the cost. (I needed the 24GB VRAM for other LLM projects).

arvinsim · on July 21, 2023

Is this brand new or used?

adventured · on July 21, 2023

Used is the only way to get a 3090 for ~$650-$750 (they're not hard to find on eBay in that general price area).

PeterStuer · on July 23, 2023

Used from ebay in my case.

isoprophlex · on July 20, 2023

Incredible! If only there was some cheap hackable eink frame, you could make a fully self contained artwork from eink panel + rpi that's (slowly) continuously updating itself..!

andrewmunsell · on July 20, 2023

There definitely are some: https://shop.pimoroni.com/search?q=e-ink

And now I think I know what my next project is going to be, I am sure I can find some desk space

isoprophlex · on July 20, 2023

Yessss! I looked into building some self contained "slow tech" generative art using eink a couple of years ago but it was just impossible for my tiny budget. This is great, thanks!!

Edit..: I'm so hyped about this; the example image on TFA takes +2 hours to generate, but who cares?! I'd love to have a little display that churns around in the background and creates a new variation on my prompt every whatever hours, displaying the results on an unobtrusive eink screen.

mw63214 · on July 20, 2023

Is it possible to incorporate a personalized "context" into the generator? Weather, market/news sentiments, calendar events, etc... to style the end result.

qwertox · on July 20, 2023

I love the idea.

causi · on July 20, 2023

Make sure you build in a capacity to save all the previous iterations in case you see something you really like.

isoprophlex · on July 20, 2023

Haha I like the idea of walking past, glancing now and then to see if there's something you really love...

but on the other hand I would also love the statement behind something unconnected to the internet that's slowly churning out unique, ephemeral pictures. Yours to enjoy, then gone forever.

civilitty · on July 20, 2023

You can make a digital sand mandala [1]

[1] https://en.m.wikipedia.org/wiki/Sand_mandala

dheera · on July 21, 2023

I made one before (https://dheera.net/projects/einkframe/) that used ShanShui (https://github.com/LingDong-/shan-shui-inf)

I'm thinking of making a Stable Diffusion version of this, and preferably with a larger eInk screen.

stavros · on July 21, 2023

You can use mine:

https://www.stavros.io/posts/making-the-timeframe/

You just put an image on some HTTP server and it shows it.

jojobas · on July 21, 2023

Listen to the speech in the room, based on hot topics generate a set of pictures for tomorrow.

mananaysiempre · on July 20, 2023

Like a continuously updating wall-mounted newspaper[1]?

[1] https://imgur.io/a/NoTr8XX (no, I don’t know why anyone would use Imgur to write up a hack either)

xtagon · on July 21, 2023

Waveshare and Pimoroni have some that work well with Raspberry Pi, if they're in your budget. I built a Waveshare epaper display + Pi Zero into a photo frame for a totally different project. Your idea tempts me.

NooneAtAll3 · on July 21, 2023

2 years ago people were already hacking updateable photo frames out of them

https://www.youtube.com/watch?v=YawP9RjPcJA

crest · on July 20, 2023

Does this mean you could fit its whole working set in the cache hierarchy of a modern high end GPU getting near 100% ALU utilisation?

Tuna-Fish · on July 20, 2023

It streams the weights. This is going to be what limits performance, not alu utilization.

johnklos · on July 20, 2023

In 260 megs of RAM?!? I'm going to try this on my Amiga!

Check back in a few months for my results...

13of40 · on July 20, 2023

Look at moneybags over here with his "megs" of RAM. I think mine only had 256K available after the kickstart disk was loaded.

johnklos · on July 20, 2023

I splurged.

http://lilith.zia.io/

mirekrusin · on July 21, 2023

I remember rendering few ray traced balls over night on Amiga, good times.

saqadri · on July 20, 2023

Incredible! The march continues to get more models to run on the edge, much faster than I anticipated. The static quantization and slicing techniques here are pretty cool

asynchronous · on July 20, 2023

I’ve been amazed at how quickly the open source community has iterated on LLMs and Diffusion models. Goes to show how well open source can work.

kuchenbecker · on July 20, 2023

Innovation in the tech world is spurred by open access.

hospitalJail · on July 21, 2023

Support the open companies. Avoid the closed ones, even if they are fantastic at marketing. ;)

nicollegah · on July 20, 2023

Wait are these inference times real? 1 second on a Raspi? Do I get this right? This is faster than on my GPU. What's going on here?

xnzakg · on July 20, 2023

Pretty sure that is just the text encoding step. Generating a complete image took 3h if I read correctly.

update: "Tests were run on my development machine: Windows Server 2019, 16GB RAM, 8750H cpu (AVX2), 970 EVO Plus SSD, 8 virtual cores on VMWare."

Kuinox · on July 20, 2023

I think it's the inference time per iteration.

nicollegah · on July 21, 2023

ahh thanks

boredemployee · on July 20, 2023

That's really cool! I always thought you needed a good amount of GPU VRAM to generate images using SD.

I wonder how fast would a consumer PC, with no GPU, generate an image with say 16gb of RAM?

wsgeorge · on July 20, 2023

On an Apple M1 with 16gig RAM, without using Pytorch compiled to take advantage of Metal, it could take 12mins to generate an image with a tweet-length prompt. With Metal, it takes less than 60 seconds.

ilkke · on July 20, 2023

Prompt length shouldn't influence creation time, at least it didn't in any of the implementations I used.

What is the resolution of your images and number of steps?

wsgeorge · on July 20, 2023

Defaults from the Huggingface repo, just copy-pasted. So, iirc 50 steps and the image is 512x512.

Edit: confirmed.

> Prompt length shouldn't influence creation time...

Yeah, checks out with my experience too. Longer prompts were truncated.

Filligree · on July 20, 2023

Some tools (e.g. Automatic1111) are able to feed in longer prompts, but then the prompt length does affect the speed of inference.

Albeit in 77 token increments.

microtonal · on July 20, 2023

And PyTorch on the M1 (without Metal) uses the fast AMX matrix multiplication units (through the Accelerate Framework). The matrix multiplication on the M1 is on par with ~10 threads/cores of Ryzen 5900X.

[1] https://github.com/danieldk/gemm-benchmark#example-results

hospitalJail · on July 21, 2023

Wtf, my 4 year old, $400 crappy low wattage computer can generate a picture in a minute or two.

DDIM, 12 steps.

asynchronous · on July 20, 2023

Metal is such an advantage, had no idea

atrus · on July 20, 2023

I was using a 6ish year old amd cpu with 16gigs of ram and generating a prompt would take about a half hour. Which is still massively impressive for what it is.

londons_explore · on July 20, 2023

Use a free GPU from google colab and you can do the same in about 15 seconds...

idiotsecant · on July 20, 2023

yes, and if he does it on a paid machine with a better GPU it'll be even faster!

While true, neither your statement or mine above is germane to the discussion. It wasn't about how long it takes. It's a discussion of how cool it is that it can be done on that machine at all.

boredemployee · on July 20, 2023

Do you have a google colab link?

luc4sdreyer · on July 21, 2023

On 21 April 2023 Google blocked usage of Stable Diffusion with a free account on colab. You need a paid plan to use it.

Apparently there are ways around it, but I just switched to runpod.io. It's very cheap (around $0.80/h for a 4090 including storage) and having a real terminal is worth it.

hadlock · on July 20, 2023

There is no shortage of google collab stable diffusion tutorials on the web

yjftsjthsd-h · on July 21, 2023

Which is why asking for one high quality starting point is such a useful question.

zirgs · on July 20, 2023

"It runs Stable Diffusion" is the new "It runs Doom".

Minor49er · on July 20, 2023

Now I'm wondering: could a monkey hitting random keys on a keyboard for an infinite amount of time eventually come up with the right prompts to get GPT-4 to produce code that compiles to a faithful reproduction of Doom?

LordDragonfang · on July 20, 2023

Probably more easily than you'd think. DOOM is open source[1], and as GP alludes, is probably the most frequently ported game in existence, so its source code almost certainly appears multiple times in GPT-4's training set, likely alongside multiple annotated explanations.

[1] https://github.com/id-Software/DOOM

anthk · on July 20, 2023

Well, not the most ported, the Z-Machine with tons of games (even ones legally available from IF archive with great programming, such as Curses!, Jigsaw, Anchorhead) might be. It runs even on the Game Boy, up to v3 games. Z5 and Z8 games will run fine from a 68020 and beyond.

janAkali · on July 21, 2023

Now I'm wondering: if there were two monkeys hitting random keys on a keyboard for an infinite amount of time, one in the gpt-4 prompt and another straight typing 0s and 1s who would produce Doom code faster?

MorbidCuriosity · on July 21, 2023

Yes. Infinity is weird.

jlokier · on July 21, 2023

No, because GPT-4 has finite memory, its context length, and its random number generator for output selection is probably pseudo-random with finite memory.

If the random number generator is pseudo-random, this makes GPT-4 a deterministic finite-state machine, and the output sequence does not necessarily contain all possible subsequences no matter how many times the monkey types a new random key. Put differently, some output subsequences may be inaccessible no matter which keys are input. Same if the random number generator is truly random but its value cannot select among all possible output tokens, only a subset provided by the GPT at each step.

MorbidCuriosity · on July 21, 2023

That's a good point, I hadn't considered the limits of GPT's memory.

jfoster · on July 21, 2023

3 hours for 8 bit. I wonder what it would be if going further. Greyscale? Black & white?

stmblast · on July 20, 2023

This is really neat! Always cool to see what people can do with less.

vikasr111 · on July 20, 2023

Interesting. Which platform/PC config did you use?

mottiden · on July 20, 2023

Amazing work!