If they are using popular images from the internet, then I strongly suspect the answers come from the text next to the known image. The man ironing on the back of the taxi has the same issue. https://google.com/search?q=mobile+phone+charger+resembling+...
I would bet good money that when we can test prompting with our own unique images, GPT4 will not give similar quality answers.
They literally sent it 1) an a screenshot of the Discord session they were in and 2) an audience submitted image
It described the Discord image in incredible detail, including what was in that, what channels they subscribed to, how many users were there. And for the audience image, it correctly described it as an astronaut on an alien planet, with a spaceship on a distant hill.
99% of the comments here have no iota of a clue what they are talking about.
There's easily a 10:1 ratio of "it doesn't understand it's just fancy autocomplete" to the alternative, in spite of published peer reviewed research from Harvard and MIT researchers months ago demonstrating even a simplistic GPT model builds world representations from which it draws its responses and not simply frequency guessing.
Watch the livestream!?! But why would they do that because they already know it's not very impressive and not worth their time outside commenting on it online.
I imagine this is coming from some sort of monkey brain existential threat rationalization ("I'm a smart monkey and no non-monkey can do what I do"). Or possibly just an overreaction to very early claims of "it's alive!!!" in an age when it was still just a glorified Markov chain. But whatever the reason, it's getting old very fast.
>published peer reviewed research from Harvard and MIT researchers months ago
Curious, source?
EDIT: Oh, the Othello paper. Be careful extrapolating that too far. Notice they didn't ask it to play the same game on a board of arbitrary size (something easy for a model with world understanding to do).
In the livestream demo they did something similar but with a DALLE-generated image of a squirrel holding a camera and it still was able to explain why it was funny. As the image was generated by DALLE, it clearly doesn't appear anywhere on the internet with text explaining why its funny. So I think this is perhaps not the only possible explanation.
It didn't correctly explain why it was funny though: which is that it's a squirrel "taking a picture of his nuts", nuts here being literal nuts and not the nuts we expect with phrasing like that.
What is funny is neither GPT-4 nor the host noticed that (or maybe the host noticed it but didn't want to bring it up due to it being "inappropriate" humor).
That interpretation never occurred to me either, actually. I suppose that makes more sense. But since it did not occur to me, I can give GPT4 some slack. It came up at the same explanation I would have.
I would bet good money that when we can test prompting with our own unique images, GPT4 will not give similar quality answers.
I do wonder how misleading their paper is.