Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Image inputs are still a research preview and not publicly available.

Will input-images also be tokenized? Multi-modal input is an area of research, but an image could be converted into a text description (?) before being inserted into the input stream.



My understanding is thta the image embedding is included, rather than converting to text.


My understanding is that image embeddings are a rather abstract representation of the image. What about if the image itself contains text, such as street signs etc?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: