That's an incredibly misleading graph. Never look at something with a growth rate on a linear scale, as it will always have a "hockey stick" shape, even with constant growth. If you look at the same series on a log scale, you see high inflation in the 70s, but from the '80s on it looks pretty similar to the pre-70s trend.
that graph showing zero growth without "tech spending" seems to indicate that.
What puzzles me is that human brains require only about 20 watts of power, plus food, shelter, relationships, medical care, and 20 years of experiential and educational training effort, so why do LLMs need so much more power, if they are piggybacking on digital human training data?
is your comment supposed to be pessimistic or optimistic? my interpretation of your post is that we should be investing more into AI because eventually we will achieve human-like AI running on 20 watts of power. we know it's possible because as you said, our brains are doing it.
If you already know a word, just mark it as "already known". If you already know all the words it's showing you, that will cause the difficulty to ramp up very quickly as it starts skipping ahead to find words you might not know. (If you scroll down to the bottom of the page and open the "graphs" section, you can see the logic behind it.)
The app "only" includes about the 3,000 most common words, so if you're past that level, I don't know how helpful it will be to you. I can easily extend this in the future, I just need bigger corpus with more data.
The ramp up feels rather slow. For 3000 words, a binary search should take less than 12 steps to find the right level. Maybe it's because you add 5 new words each time, turning those 12 steps into 60 words instead?
Also, I'm confused that you say you would need a bigger corpus for more words, since your readme says that you use the OpenSubtitles data from OPUS. Their 2024 release has tens of millions of sentences for each language, which surely should be enough for tens of thousands of unique words?
Thanks for the feedback. I’ve tuned the initial ramp up to be more aggressive and I’ve made it so it adds words in smaller increments at first. Now, after adding 16 words and marking them all as known, it skips to the ~500th most common word. After adding 25, it skips to the ~100th.
What you say about binary search as a good point. I initially used something more like a straightforward binary search, but the issue is that the ramp up is too quick and beginner users would end up adding a bunch of words that were way too advanced for that level. So I tried to make it less aggressive to avoid overshooting, but I guess that has the opposite issue of it taking longer for advanced users. I’ll think about what I can do about that.
For the corpus, I prefer to use Neri’s sentence lists as they’re much higher quality than opensubtitles. You’d be surprised at the problems it has. So I only use opensubtitles for korean (because Neri’s sentence lists doesn’t have a korean version).