Agreed. But Othello syntax happens to map perfectly to possible Othello moves, so the most efficient representation of the information contained within Othello happens to be a data structure which maps to an Othello board. Other information like visuals on the players would, from the point of view of evaluating the next Othello move, just be noise.
Human language doesn't map perfectly or even particularly closely to the physical world we inhabit or emotions we experience though, so a maximally efficient model of human language will overfit to useless semantic features whilst lacking context on what foo that follows baz is. Understanding the physical or emotional world solely from human language is more like trying to use the Othello LLM's state representation to establish the colour of the board
The sense data your brain ingests does not map perfectly or even particularly closely to the real physical world we inhabit.
Whether it’s through language, vision (which feels quite “real” but is really just a 2D projection of light that we interpret), sound or anything else, it’s all just some byproduct of the world that we nonetheless can make useful predictions with.
There is enough information to build "a" world model in all the text ever written by humans about the world. Not necessarily the "one true model" that is your own personal life experience of the world since birth.
>Understanding the physical or emotional world solely from human language is more like trying to use the Othello LLM's state representation to establish the colour of the board
Nobody truly understands the physical world. Don't you think the birds that can feel the electromagnetic fields around the earth and use it to guide their travels would tell you your model was fundamentally incorrect ?
Certainly, LLMs are more limited in their input data, but it's not a fundamental difference. and adding more modalities is trivial.
> The sense data your brain ingests does not map perfectly or even particularly closely to the real physical world we inhabit.
I never argued otherwise, though being aware that there is a physical world that I can interact with helps! The point is that the only reason the LLM's transformation of syntax approximated an Othello board was the unusually perfect correspondence between permutations, syntax and efficient storage that seldom exists. In other circumstances your LLM vectors are modelling language constructs, lies and other abstractions that only incidentally touch on world and brain state.
The term "understanding" is generally used by humans to refer to how humans model reality[1] and need not imply completeness. But it also implies that a model isn't extremely dissimilar to humans in what its parsed and how its parsed it. Or to slightly alter your example, if a bat argued that following the bat swarm well enough to locate the exit didn't mean humans had achieved "true echolocation", I'd have to agree with them.
I mean, a photograph and a pocket calculator are also representations of some aspect of the state of the world, sometimes even representing a particular subset of the world information in sufficient fidelity to allow humans to make much better predictions about it. But fewer people seem to wish to stan for the capacity of the calculator or the bitmap to have "real understanding" of the outputs they emit, even though fundamentally the LLM has much in common with them and far less in common with the human...
[1]the potential for debate around such definitions underlines the paucity of language...
Human language doesn't map perfectly or even particularly closely to the physical world we inhabit or emotions we experience though, so a maximally efficient model of human language will overfit to useless semantic features whilst lacking context on what foo that follows baz is. Understanding the physical or emotional world solely from human language is more like trying to use the Othello LLM's state representation to establish the colour of the board