On an Apple M1 with 16gig RAM, without using Pytorch compiled to take advantage ...

ilkke · on July 20, 2023

Prompt length shouldn't influence creation time, at least it didn't in any of the implementations I used.

What is the resolution of your images and number of steps?

wsgeorge · on July 20, 2023

Defaults from the Huggingface repo, just copy-pasted. So, iirc 50 steps and the image is 512x512.

Edit: confirmed.

> Prompt length shouldn't influence creation time...

Yeah, checks out with my experience too. Longer prompts were truncated.

Filligree · on July 20, 2023

Some tools (e.g. Automatic1111) are able to feed in longer prompts, but then the prompt length does affect the speed of inference.

Albeit in 77 token increments.

microtonal · on July 20, 2023

And PyTorch on the M1 (without Metal) uses the fast AMX matrix multiplication units (through the Accelerate Framework). The matrix multiplication on the M1 is on par with ~10 threads/cores of Ryzen 5900X.

[1] https://github.com/danieldk/gemm-benchmark#example-results

hospitalJail · on July 21, 2023

Wtf, my 4 year old, $400 crappy low wattage computer can generate a picture in a minute or two.

DDIM, 12 steps.

asynchronous · on July 20, 2023

Metal is such an advantage, had no idea