Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On an Apple M1 with 16gig RAM, without using Pytorch compiled to take advantage of Metal, it could take 12mins to generate an image with a tweet-length prompt. With Metal, it takes less than 60 seconds.


Prompt length shouldn't influence creation time, at least it didn't in any of the implementations I used.

What is the resolution of your images and number of steps?


Defaults from the Huggingface repo, just copy-pasted. So, iirc 50 steps and the image is 512x512.

Edit: confirmed.

> Prompt length shouldn't influence creation time...

Yeah, checks out with my experience too. Longer prompts were truncated.


Some tools (e.g. Automatic1111) are able to feed in longer prompts, but then the prompt length does affect the speed of inference.

Albeit in 77 token increments.


And PyTorch on the M1 (without Metal) uses the fast AMX matrix multiplication units (through the Accelerate Framework). The matrix multiplication on the M1 is on par with ~10 threads/cores of Ryzen 5900X.

[1] https://github.com/danieldk/gemm-benchmark#example-results


Wtf, my 4 year old, $400 crappy low wattage computer can generate a picture in a minute or two.

DDIM, 12 steps.


Metal is such an advantage, had no idea




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: