Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I really love everything you're doing!

Personal request: could you also advocate for "image previz rendering", which I feel is an extremely compelling use case for these companies to develop. Basically any 2d/3d compositor that allows you to visually block out a scene, then rely on the model to precisely position the set, set pieces, and character poses.

If we got this task onto benchmarks, the companies would absolutely start training their models to perform well at it.

Here are some examples:

gpt-image-1 absolutely excels at this, though you don't have much control over the style and aesthetic:

https://imgur.com/gallery/previz-to-image-gpt-image-1-x8t1ij...

Nano Banana (Pro) fails at this task:

https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd

Flux Kontext, Qwen, etc. have mixed results.

I'm going to re-run these under gpt-image-1.5 and report back.

Edit:

gpt-image-1.5 :

https://imgur.com/a/previz-to-image-gpt-image-1-5-3fq042U

And just as I finish this, Imgur deletes my original gpt-image-1 post.

Old link (broken): https://imgur.com/a/previz-to-image-gpt-image-1-Jq5M2Mh

Hopefully imgur doesn't break these. I'll have to start blogging and keep these somewhere I control.





Thanks! A highly configurable Previz2Image model would be a fantastic addition. I was literally just thinking about this the other day (but more in the context of ControlNets and posable kinematic models). I’m even considering adding an early CG Poser blocked‑out scene test to see how far the various editor models can take it.

With additions like structured prompts (introduced in BFL Flux 2), maybe we'll see something like this in the near future.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: