LLMs are enormously bandwidth hungry. You have to shuffle your 800GB neural network in and out of memory for every token, which can take more time/energy than actually doing the matrix multiplies. GPUs are almost not high bandwidth enough.
But even so, for a single user, the output rate for a very fast LLM would be like 100 tokens per second. With graphics, we're talking like 2 million pixels, 60 times a second; 120 million pixels per second for a standard high res screen. Big difference between 100 tokens vs 120 million pixels.
24 bit pixels gives 16 million possible colors... For tokens, it's probably enough to represent every word of the entire vocabulary of every major national language on earth combined.
> You have to shuffle your 800GB neural network in and out of memory
Do you really though? That seems more like a constraint imposed by graphics cards. A specialized AI chip would just keep the weights and all parameters in memory/hardware right where they are and update them in-situ. It seems a lot more efficient.
I think that it's because graphics cards have such high bandwidth that people decided to use this approach but it seems suboptimal.
But if we want to be optimal; then ideally, only the inputs and outputs would need to move in and out of the chip. This shuffling should be seen as an inefficiency; a tradeoff to get a certain kind of flexibility in the software stack... But you waste a huge amount of CPU cycles moving data between RAM, CPU cache and Graphics card memory.
It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register.
For every token that is generated, a dense llm has to read every parameter in the model.
This doesn't seem right. Where is it shuffling to and from? My drives aren't fast enough to load the model every token that fast, and I don't have enough system memory to unload models to.
If you're using a MoE model like DeepSeek V3 the full model is 671 GB but only 37 GB are active per token, so it's more like running a 37 GB model from the memory bandwidth perspective. If you do a quant of that it could e.g. be more like 18 GB.
There won't be a single time you can observe yourself carrying the weight of everything being moved out of the house because that's not what's happening. Instead you can observe yourself taking many tiny loads until everything is finally moved, at which point you yourself should not be loaded as a result of carrying things from the house anymore (but you may be loaded for whatever else you're doing).
Viewing active memory bandwidth can be more complicated than it'd seem to set up, so the easier way is to just view your VRAM usage as you load in the model freshly into the card. The "nvtop" utility can do this for most any GPU on Linux, as well as other stats you might care about as you watch LLMs run.
My confusion was on the shuffling process happening per token. If this was happening per token, it would be effectively the same as loading the model from disk every token.
The model might get loaded on every token - from GPU memory to GPU. This depends on how much of it is cached on GPU. Inputs to every layer must be loaded as well. Also, if your model doesn’t fit in GPU memory but fits in CPU memory, and you’re doing GPU offloading, then you’re also shuffling between CPU and GPU memory.
Laptop manufacturers are too desperate to cash on the AI craze. There's nothing special about an 'AI PC'. It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.
>I don't want this garbage on my laptop, especially when its running of its battery!
The one bit of good news is it's not going to impact your battery life because it doesn't do any on-device processing. It's just calling an LLM in the cloud.
That's not quite correct. Snapdragon chips that are advertised as being good for "AI" also come with the Hexagon DSP, which is now used for (or targeted at) AI applications. It's essentially a separate vector processor with large vector sizes.
> It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.
"AI PC" branded devices get "Copilot+" and additional crap that comes with that due to the NPU. Despite desktops having GPUs with up to 50x more TOPs than the requirement, they don't get all that for some reason https://www.thurrott.com/mobile/copilot-pc/323616/microsoft-...
Doesn't this lead to a lot of tension between the hardware makers and Microsoft?
MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.
Laptop manufacturers are making laptops that can run an LLM locally, but there's no point in that unless there's a local LLM to run (and Windows won't have that because Copilot). Are they going to be pre-installing Llama on new laptops?
Are we going to see a new power user / normal user split? Where power users buy laptops with LLMs installed, that can run them, and normal folks buy something that can call Copilot?
It isn't just copilot that these laptops come with; manufacturers are already putting their own AI chat apps as well.
For example, the LG gram I recently got came with just such an app named Chat, though the "ai button" on the keyboard (really just right alt or control, I forget which) defaults to copilot.
If there's any tension at all, it's just who gets to be the default app for the "ai button" on the keyboard that I assume almost nobody actually uses.
> MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.
MS doesn't care where your data is, they're happy to go digging through your C drive to collect/mine whatever they want, assuming you can avoid all the dark patterns they use to push you to save everything on OneDrive anyway and they'll record all your interactions with any other AI using Recall
It's just marketing. The laptop makers will market it as if your laptop power makes a difference knowing full well that it's offloaded to the cloud.
For a slightly more charitable perspective, agentic AI means that there is still a bunch of stuff happening on the local machine, it's just not the inference itself.
There's nothing special with what Intel has lowered the bar as an AI PC so vendors can market it. Ollama can run a 4b model plenty fine on Tiger Lake with 8gb classic RAM.
But unified memory IS truly what makes an AI ready PC. The Apple Silicon proves that. People are willing to pay the premium, and I suspect unified memory will still be around and bringing us benefits even if no one cares about LLMs in 5 years.
Even collecting and sending all that data to the cloud is going to drain battery life. I'd really rather my devices only do what I ask them to than have AI running the background all the time trying to be helpful or just silently collecting data.
Windows is going more and more into AI and embedding it into the core of the OS as much as it can. It’s not “an app”, even if that was true now it wouldn't be true for very long. The strategy is well communicated.
Unfortunately still loads of hurdles for most people.
AAA Games with anti-cheat that don't support Linux.
Video editing (DaVinci Resolve exists but is a pain to get up and running on many distros, KDenLive/OpenShot don't really cut it for most)
Adobe Suite (Photoshop/Lightroom specifically, and Premiere for Video Editing) - would like to see Affinity support Linux but hasn't happened so far. GIMP and DarkTable aren't really substitutions unless you pour a lot of time into them.
Tried moving to Linux on my laptop this past month, made it a month before a reinstall of Windows 11. Had issues with WiFi chip (managed to fix but had to edit config files deep in the system, not ideal), Fedora with LUKS encryption after a kernel update the keyboard wouldn't work to input the encryption key, no Windows Hello-like support (face ID). Had the most success with EndeavourOS but running Arch is a chore for most.
It's getting there, best it's ever been, but there's still hurdles.
> AAA Games with anti-cheat that don't support Linux.
I really don't understand people that want to play games so badly that they are willing to install a literal rootkit on their devices. I can understand if you're a pro gamer but it feels stupid to do it otherwise.
According to my friends, Arc Raders works well on linux. So it's very much, just a small selection of AAA games, so they can run anti-cheat, that probably doesn't even work. Can you name a triple a you want to play, that proton says is incompatible?
Gimp isn't a solution, sure but it works for what I need. Darktable does way more than I've ever wanted, so I can forgive it for the one time it crashed. Inkscape and blender both exceed my needs as well.
And Adobe is so user hostile, that I feel I need to call you a mean name to prove how I feel.... dummy!
Yes, I already feel bad, and I'm sorry. But trolling aside, listing applications that treat users like shit, aren't reasons to stay on the platform that also treats you like shit.
I get it, sometimes, being treated like shit is worth it because it's easier now that you're used to being disrespected. But an aversion to the effort it'd take for you to climb the learning curve of something different, isn't valid reason to help the disrespectful trash companies making the world worse, recruit more people for them to treat like trash.
Just because you use it, doesn't make it worth recommending.
I don't really PC game anymore, use my Xbox or a few older games my laptop's iGPU can handle, not at the moment anyway. Battlefield 6 is a big one recently that if I had a gaming PC set-up I'd probably want to play.
I know Adobe are... c-words, but their software is industry standard for a reason.
> Battlefield 6 is a big one recently that if I had a gaming PC set-up I'd probably want to play.
We definitely play very different games, I wouldn't touch it if you paid me. So I'm sure we both have a bit of sample bias in our expected rates of linux compatibility. Especially since EA is another company like Adobe. Also, the internet seems to think they have a cheating problem. I wonder how bad it really is, and if it's worth the cost of the anti-cheat.
They're industry standard because they were first. Not necessarily because they were better. They do have a feature set that's near impossible to beat, not even I can pretend like they don't. I'm just saying, respect and fairness is more important to me, than content aware fill ever will be.
It’s not an LLM, but it is genAI. It’s based on the same idea of predict-the-next-thing, but instead of predicting words it predicts the next state of the atmosphere from the current state.
> Which is surprising to me because I didn't think it would work for this; they're bad at estimating uncertainty for instance.
FGN (the model that is 'WeatherNext 2'), FourCastNet 3 (NVIDIA's offering), and AIFS-CRPS (the model from ECMWF) have all moved to train on whole ensembles, using a cumulative ranked probability score (CRPS) loss function. Minimizing the CRPS minimizes the integrated square differences of the cumulative density function between the prediction and truth, so it's effectively teaching the model to have uncertainty proportional to its expected error.
GenCast is a more classic diffusion-based model trained on a mean-squared-error-type loss function, much like any of the image diffusion models. Nonetheless it performed well.
The trouble is that english doesn’t fit neatly into any of these categories. It has features that make it at least a context-free language, but can’t handle other features of context-free languages like unlimited nesting.
Ultimately these are categories of formal languages, and natural language is an entirely different kind of thing.
No reason to do that though, except to validate some random persons perspective on language. The sky will not open and smash us with a giant foot if we reject such an obligation.
OK? What concrete human problems human biology faces are resolved by this groups consensus? Obsession with notation does little to improve crop yields, or improve working conditions for the child labor these academic geniuses rely on.
Sure, linguists, glad you found some semantics that fit your obsession. Happy for you!
Most people will never encounter their work and live their lives never knowing such an event happened.
You can also reject quantum physics and the sky will not open and smash us with a giant foot. However, to do so without serious knowledge of physics would be quite dumb.
Apples and oranges. Language emerges from human biology which emerges from the physical realm. In the end language emerges then from the physical realm. Trying to de-couple it from physical nature and make it an abstract thought bubble is akin to bike shedding in programming.
> TikTok was only able to receive this information with the help of the Israeli data company AppsFlyer and Grindr itself.
So basically, the TikTok app is not spying on your dating apps - your dating apps are willingly selling your information to them, through intermediaries.
This means uninstalling tiktok won’t help. And worse, many other companies are getting your dating info too.
Grindr had a big data "leak" in 2024.[1] Not a "leak", really, just ordinary reselling of people's gay and HIV status. In 2025, a data broker who resold Grindr data also had a big breach. That wasn't Grindr-specific - it included Temple Run, Subway Surfers, Tinder, Grindr, MyFitnessPal, Candy Crush, Truecaller, 9GAG, Microsoft 365, and others. But not TikTok, because TikTok monetizes that info themselves.
Why does that matter if he's not seeing ads. A severely crippled adblocker means that you would see ads during regular usage.
Additionally, Brave a chromium based browser has adblocking built into the browser itself meaning it is not affected by webextention changes and does not require trusting an additional 3rd party.
>the present generation of automated systems, which are monitored by former manual operators, are riding on their skills, which later generations of operators cannot be expected to have.
But we are in the later generation now. All the 1983 operators are now retired, and today's factory operators have never had the experience of 'doing it by hand'.
Operators still have skills, but it's 'what to do when the machine fails' rather than 'how to operate fully manually'. Many systems cannot be operated fully manually under any conditions.
And yet they're still doing great. Factory automation has been wildly successful and is responsible for why manufactured goods are so plentiful and inexpensive today.
It's not so simple. The knowledge hasn't been transferred to future operators, but to process engineers who are kow in charge of making the processes work reliably through even more advanced automation that requires more complex skills and technology to develop and produce.
No doubt, there are people that still have knowledge of how the system works.
But operator inexperience didn't turn out to be a substantial barrier to automation, and they were still able to achieve the end goal of producing more things at lower cost.
Google made some very large ngram models around twenty years ago. This being before the era of ultra-high-speed internet, it was distributed as a set of 6 DVDs.
It achieved state-of-the-art performance at tasks like spelling correction at the time. However, unlike an LLM, it can't generalize at all; if an n-gram isn't in the training corpus it has no idea how to handle it.
I have this DVD set in my basement. Technically, there are still methods for estimating the probability of unseen ngrams. Backoff (interpolating with lower grams) is an option. You can also impose prior distributions like a Bayesian so that you can make "rational" guesses.
Ngrams are surprisingly powerful for how little computation they require. They can be trained in seconds even with tons of data.
reply