Gary Marcus has a very particular view of human psychology that was popular for a while, especially in the early 2000s. Major proponents include Steven Pinker, Noam Chomsky, Jerry Fodor. This view was heavily influenced by symbolic computers when symbolic computers were new and so held some mental share leading into the dotcom boom. For several reasons, one of which is the replication crisis, this view is no longer nearly as popular.
One of the major beliefs of this view is that LLMs are essentially impossible because there's not enough information in language to learn it unless you have a special purpose language-learning module built into the brain by evolution. This is Chomsky's "poverty of the stimulus" argument.
Marcus still defends this view and because of this bias is committed to trying to prove to everyone that LLMs are not possible or at least that they're some kind of illusion. There's a sense in which they threaten his fundamental concept of how the brain works.
In proposing and defending these views he appears to me and others to be having a sort of internal crisis that's playing out publicly where his need to be right about this is coloring his judgment and objectivity.
> trying to prove to everyone that LLMs are not possible or at least that they're some kind of illusion
This is such poor phrasing I can't help but wonder if it was intentional. The argument is over whether LLMs are capable of AGI, not whether "LLMs are possible".
You also 100% do not have to buy into Chomsky's theory of the brain to believe LLMs won't achieve AGI.
Your reasoning seems clear enough to me. cmiiw, you’re saying Marcus says LLMs don’t really understand language and only present an illusion of that understanding. And that illusion will noticeably break at a certain scale. And to be honest when context windows get filled up to a certain point, they do become unintelligible and stupid.
In spite of this, I think LLMs display intelligence and for me that is more useful than their understanding of language. I haven’t read anything from Chomsky tbh.
The utility of LLMs come from their intelligence and the price point at which it is achieved. Ideally the discussion should focus on that. The deeper discussion of AGI should not worry the policy makers or the general public. But unfortunately, business seems intent on diving into the philosophical arguments of how to achieve AGI because that is the logic they have chosen to convince people into giving them more and more capital. And that is what makes Marcus’ and his friends’ critique relevant.
One can’t really critique people like Marcus saying he is being academic and pedantic on LLM capabilities, are they real, are they not etc when the money is relentlessly chasing those un-achieved capabilities.
So even though you’re saying we aren’t talking about AGI and this isn’t the topic, everything kind of circles back into AGI and the amount of money being poured into chasing that.
I would appreciate if you and the GP not personally insult me when you have a question though. You may feel that you know Marcus to be into one particular thing but some of us have been familiar with his work long before he pivoted to AI.
I'm sorry, I didn't mean to insult you. To explain the reason: you seem to use some particular wordings that just seem strange to me, such as first saying that Marcus position is that "LLMs are impossible" which is either false or incredibly imprecise shortcut for "AGI using LLMs is impossible", and then claiming it was beautiful.
I didn't mean to attack you personally and I'm really sorry if it sounded this way. I appreciate the generally positive atmosphere on HN and I believe it more important than the actual argument, whatever it may be.
The first is that your phrasing "that LLMs are not possible or at least that they're some kind of illusion" collapses the claim being made to the point where it looks as if you're saying Marcus believes people are just deluded that something called a "LLM" exists in the world. But even allowing for some inference as to what you actually meant, it remains ambiguous whether you are talking about language acquisition (which you are in your 2nd paragraph) or the genuine understanding and reasoning / robust world model induction necessary for AGI, which is the focus of Marcus' recent discussion on LLMs, and why we're even talking about Marcus here in the first place.
You seem more familiar with Marcus' thinking on language acquisition than I, so I can only assume that his thinking on language acquisition and LLMs is somewhat related to his thinking on understanding and reasoning / world model induction and LLMs. But it doesn't appear to me, based on what I've read of Marcus, that his claims about the latter really depend on Chomsky. Which brings me to the 2nd problem with your post, where you make the uncharitable claim that "he appears to me and others to be having a sort of internal crisis that's playing out publicly", as if it were simply impossible to believe that LLMs are not capable of genuine understanding / robust world model induction otherwise.
But you need to agree that we can single out one specific argument he is most famous for, right? And when Josh Wolfe is saying that "Apple Gary Marcus'd LLM's reasoning ability", it does refer to a very specific issue of whether LLMs can lead to AGI or not.
I don't know whether he's right or wrong, but deep down I feel the amount of money funneled in the Transformed architecture might have blocked other approaches, both potentially promising and others doomed for failure, just because LLMs' quick wins.
Chomsky and Pinker (I don't Fodor), are entertainers and their theories are good only for PhD pumpers. How human brain works and the associated neuroscience has absolutely nothing to do with LLMs which simply are empirically defined function approximations.
The problem with Sam Altman is that he is a shyster and leech feeding off the hard work of thousands of programmers and engineers.
> One of the major beliefs of this view is that LLMs are essentially impossible because there's not enough information in language to learn it unless you have a special purpose language-learning module built into the brain by evolution. This is Chomsky's "poverty of the stimulus" argument
The argument is that there is not enough information available to a child to do this. So even if we grant the dubious premise that LLMs have learned to speak languages in a manner analogous to humans, they are not a counterexample to Chomsky’s poverty of the stimulus argument because they have been trained on a vast array of linguistic data that is not available within a single human childhood.
If you want to better understand Chomsky’s position, it’s easiest to do so in relation to other animals. Why are other intelligent animals not able to learn human languages? The rather unsurprising answer, in Chomsky’s view, is that humans have a built-in linguistic capacity, rather in the way that e.g. bats have a built in capacity for echolocation. The claim that bats have a built-in capacity for echolocation is not refuted by the existence of sonar. Likewise, our ability to construct machines that mimic some aspects of human linguistic capacity does not automatically refute the hypothesis that this is a specific human capacity absent in other animals.
Imagine if sonar engineers were constantly shitting on chiropterologists because their silly theory of bats having evolved a capacity for echolocation has now been refuted by human-constructed sonar arrays. The argument makes so little sense that it’s difficult to even imagine the scenario. But the argument against Chomsky from LLMs doesn’t really make any more sense, on reflection.
Chomsky hasn’t helped his case in recent years by tacking his name on some dumb articles about LLMs that he didn’t actually write. (A warning to us all that retirement is a good idea.) So I don’t blame people who are excited about LLMs for seeing him as a bit of rube, but the supposed conflict between Chomsky and LLMs is entirely artificial. Chomsky is (was) trying to do cognitive science. People experimenting with LLMs are not, on the other hand, making any serious effort to study how humans acquire language, and so have very little of substance to argue with Chomsky about. They are just taking opportunistic pot shots at a Big Name because it’s a good way to get attention.
For the record, Chomsky himself has never made any very specific claims about a dedicated module in the brain or about the evolutionary origins of human linguistic capacity (except for some skeptical comments on it being a gradual adaptation).
There was a large literature on language acquisition prior to the invention of LLMs that showed that Chomsky's argument likely wasn't correct. This is in addition to the fact that he significantly underestimated the amount of linguistic input children receive.
There's too much to hash it out here in HN. You can try to save the LAD argument by a strategic retreat, but it's been in retreat for decades now and keeps losing ground. It's clear that neural networks can learn the rules of grammar without specifically baking grammatical hierarchies into the network. You can retreat to saying it's about setting hyper parameters or priors but even the evidence for that is marginal.
There are certainly features of the brain that make language learning easier (such as size) but POS doesn't really provide anything to guide research and is mostly of historical interest now. It's a claim that something is impossible, which is a strong claim. And the evidence for it is poor. It's not clear it would have any adherents if it were proposed anew today. And this is all before LLMs enter the picture.
The research from neuroscience and learning theory and machine learning etc have all pointed toward a view of the brain as significantly different from the psychological nativism view. When many prominent results in the nativist camp failed to replicate during the replicability crisis, most big name researchers pivoted to other fields. Marcus is one of the remaining vocal holdouts for nativism. And his beliefs about AI align very closely with all the old debates about statistical learning models vs symbolic manipulation etc.
> Why are other intelligent animals not able to learn human languages?
Animals and plants do communicate with each other in structured ways. Animals can learn to communicate with humans. This is one of those areas where you can choose to try to see the continuities with communication or you can try to define a vision of language that isolates human language as completely set apart. I think human language is more like an outlier in complexity to the communication animals do rather than a fundamentally different thing. In that sense there's not much of a mystery given brain size, number of neurons, sociality etc.
> The argument is that there is not enough information available to a child to do this
Yes, but children are the humans who earn language in the typical case. So you can replace "child" with "human" especially with all the hedging I did in my first post (e.g. "essentially"). As I said above Chomsky is known to have underestimated the amount of input babies receive. Babies hear language from the moment they're born until they learn to speak. Also, as a parent, I often correct grammatical and other mistakes as toddlers learn to talk. Other parents do the same. Part of the POS is based on the premise that children don't get their grammar corrected often.
Yes, lots of people have argued that Chomsky is wrong about various things for various reasons and at various times. The point of my post was not to get into all of those historical arguments, but to point out that recent developments in LLMs are largely irrelevant. But I'll briefly respond to some of your broader points.
You mention 'neural networks' learning rules of grammar. Again, this is relevant to Chomsky's argument only to the extent that such devices do so on the basis of the kind of data available to a child. Here you implicitly reference a body of research that's largely non-existent. Where are the papers showing that neural networks can learn, say, ECP effects, ACD, restrictions on possible scope interpretations, etc. etc., on the basis of a realistic child linguistic corpus?
Your 'continuities' argument cuts both ways. There are continuities between human perception and bat perception and between bat communication and human communication; but we still can't echolocate, and bats still can't hold conversations. The specifics matter here. Is bat echolocation just a more complex variant of my very slight ability to sense whether I'm in an enclosed location or an outdoor space when I have my eyes closed? And is the explanation for why bats but not humans have this ability that bat cognition is just more sophisticated than human cognition? I'm sure neural networks can be trained to do echolocation too. Humans can train an artificial network to do echolocation, therefore it can't be a species-specific capacity of bats. << This seems like a terrible argument, no?
Poverty of the stimulus arguments don't really depend at all on the assumption that parents don't correct children, or that children ignore such corrections. If you look at specific examples of the kind of grammatical rules that tend to interest generative linguists (e.g. ACD, ECP effects, ...) then parents don't even know about any of these, and certainly aren't correcting their children on them.
Chomsky has never made any specific estimate of the 'amount' of input that babies receive, so he certainly can't be known to have underestimated it. Poverty of the stimulus arguments are at heart not quantitative but rather are based on the assumption that certain specific kinds of data are not likely to be available in a child's input. This assumption has been validated by experimental and corpus studies (e.g. https://sites.socsci.uci.edu/~lpearl/courses/readings/LidzWa...)
> Babies hear language from the moment they're born until they learn to speak
I can assure you that this insight is not lost on anyone who works on child language acquisition :)
A realistic child linguistic corpus for a 2 year old starting to form sentences would be about 15 million words over the course of their lifetime. Converted to LM units that's maybe about 20 million tokens. There are small language models trained on sets that small.
Some LMs are specifically trained on child-focused small corpora in the 10 million range, e.g. BabyLM: https://babylm.github.io.
Keep in mind that before age 2, children are using individual words and getting much richer feedback than LMs are.
Humans can and do echolocate: https://en.wikipedia.org/wiki/Human_echolocation. There are also anatomical differences that are not cognitive that affect the abilities like echolocation. For example, the positioning and frequency response of sensors (e.g. ears) can affect echolocation performance.
Yes, humans can echolocate to a limited extent, just as some animals have very limited analogs of human language. That was the point of the comparison. It is no more sensible to view human language as just a more complex variant of vervet monkey calls than it is to view bat echolocation as just a more complex variant of whatever limited capacity humans have in that area. There is continuity viewed from the outside, if you squint a little, but that's unlikely to correspond to continuity in terms of the underlying cognitive mechanisms. Bats, for example, can make precise calculations of distance based on a built-in reference for the speed of sound: https://www.pnas.org/doi/10.1073/pnas.2024352118
Children don't get 'rich feedback' at all on the grammatical structure of their sentences. I think this idea is probably based on a misconception of what 'grammar' is from a generative linguistics perspective. When was the last time that a child got rich feedback on their misinterpretation of an ACD construction? https://www.bu.edu/bucld/files/2011/05/29-SyrettBUCLD2004.pd...
LLMs trained on small datasets don't perform that well from the point of view of language acquisition – even up to 100 million tokens. There's not a very large literature on this because, as I said, there are many more people interested in making a drive-by critique of generative linguistics than there are people who are genuinely interested in investigating different models of child language acquisition. But here is one suggestive result: https://aclanthology.org/2025.emnlp-main.761.pdf See also the last paragraph of p.6 onwards of https://arxiv.org/pdf/2308.03228
The other point that's often missed in evaluations of these models is their capacity for learning completely non-human-like languages. Thus, the BabyLM models have some limited success in learning (for example) some island constraints, but could just have easily acquired languages without island constraints. That then leaves the question of why we do not see human languages without such constraints.
>Children don't get 'rich feedback' at all on the grammatical structure of their sentences.
They probably do get parents and the like correcting them or giving an example. Kid says we goed fish, adult say yeah we went fishing. I taught English as a foreign language a bit and people learn almost entirely from examples like that rather than talking about ellipsis or any sort of grammar jargon.
It seems brains / neurons / LLMs are good at pattern recognition. Brains probably quicker on the uptake than LLM backpropagation though.
That particular example is irrelevant to poverty of the stimulus arguments because no-one has ever suggested that kids acquiring English lack evidence for the irregular past tense of ‘go’.
See above for some examples of the kinds of grammatical principles that can form the basis of a poverty of the stimulus argument. They’re not generally the kind of thing that parental corrections could conceivably help with, for two reasons:
1) (The main reason) Poverty of the stimulus arguments relate to features of grammatical constructions that are rarely exemplified. As examples are rarely uttered, deviant instances are rarely corrected, even assuming the presence of superlatively wise and attentive caregivers.
2) (The reason that you mention) Explicit instruction on grammatical rules has almost no effect on most people, especially young children. So corrections at most add a few more examples of bad sentences to the child’s dataset, which they can probably obtain anyway via more indirect cues.
If corrections were really effective, someone should be able to do a killer experiment where they show an improved (i.e. more adult-like) handling of, say, quantifier scope in four year olds after giving them lots of relevant corrections. I am open minded about the outcome of such an experiment, but I’d bet a fairly large amount of money that it would go nowhere.
Gary Marcus is not changing the story financiers are telling each other about AI. He has been telling the same story without making a dent in their stories. And that is because he is not in the arena. What he says, does and thinks doesn't matter.