No discussion with Schmidhuber is complete without the infamous debate at NIPS 2016 https://youtu.be/HGYYEUSm-0Q?t=3780 . One of my goals as a ML researcher is to publish something and have Schmidhuber claim he's already done it.
But more seriously, I'm not a fan of Schmidhuber because even if he truly did invent all this stuff early in the 90s, he's inability to see its application to modern compute held the field back by years. In principle, we could have had GANs and self-supervised models' years earlier if he had "revisited his early work". It's clear to me no one read his early paper's when developing GANs/self-supervision/transformers.
> he's inability to see its application to modern compute held the field back by years.
I find Schmidhuber's claim on GANs to be tenuous at best, but his claim to have anticipated modern LLMs is very strong, especially if we are going to be awarding nobel prizes for Boltzmann Machines. In https://people.idsia.ch/%7Ejuergen/FKI-147-91ocr.pdf, he really does concretely describe a model that unambiguously anticipated modern attention (technically, either an early form of hypernetworks or a more general form of linear attention, depending on which of its proposed update rules you use).
I also strongly disagree with the idea that his inability to practically apply his ideas held anything back. In the first place, it is uncommon for a discoverer or inventor to immediately grasp all the implications of and applications of their work. Secondly, the key limiter was parallel processing power; it's not a coincidence ANNs took off around the same time GPUs were transitioning away from fixed function pipelines (and Schmidhuber's lab were pioneers there too).
In the interim, when most derided Neural networks, his lab was one of the few that kept research on Neural networks and their application to sequence learning going. Without their contributions, I'm confident Transformers would have happened later.
> It's clear to me no one read his early paper's when developing GANs
This is likely true.
> self-supervision/transformers.
This is not true. Transformers came after lots of research on sequence learners, meta-learning, generalizing RNNs and adaptive alignment. For example, Alex Graves' work on sequence transduction with RNNs eventually led to the direct precursor of modern attention. Graves' work was itself influenced by work with and by Schmidhuber.
It's very common in science for people to have had results they didn't understand the significance of that later were popularized by someone else.
There is the whole thing with Damadian claiming to have invented MRI (he didn't) when the Nobel prize went to Mansfield and Lauterbur (see the Nobel prize part of the article).
https://en.m.wikipedia.org/wiki/Paul_Lauterbur
And I've seen other less prominent examples.
It's a lot like the difference between ideas and execution and people claiming someone "stole" their idea because they made a successful business from it.
Given that you're a researcher yourself I'm surprised by this comment. Have you not yourself experienced the harsh rejection of "not novel"? That sounds like a great way to get stuck in review hell. (I know I've experienced this even when doing novel things just by too closely relating it to other methodologies when explaining "oh, it's just ____").
The other part seems weird too. Who isn't upset when their work doesn't get recognized and someone else gets credit. Are we not all human?
I think he did understand both the significance of his work and the importance of hardware. His group pioneered porting models to GPUs.
But personal circumstances matter a lot. He was stuck at IDSIA in Lugano, i.e. relatively small and not-so-well funded academia.
He could have done much better in industry, with access to lots of funding, a bigger headcount, and serious infrastructure.
Ultimately, models matter much less than infrastructure. Transformers are not that important, other architectures such as deep SSMs or xLSTM are able to achieve comparable results.
I don't understand how he's at fault for the field being behind where it maybe could've been, especially the language "held back"? Did he actively discourage people in against trying his ideas as compute grew?
But more seriously, I'm not a fan of Schmidhuber because even if he truly did invent all this stuff early in the 90s, he's inability to see its application to modern compute held the field back by years. In principle, we could have had GANs and self-supervised models' years earlier if he had "revisited his early work". It's clear to me no one read his early paper's when developing GANs/self-supervision/transformers.