More

Legend2440 · 2025-12-23T23:03:11 1766530991

If they had responded, he would have mentioned it in his comment.

Legend2440 · 2025-12-23T03:43:44 1766461424

LLMs are enormously bandwidth hungry. You have to shuffle your 800GB neural network in and out of memory for every token, which can take more time/energy than actually doing the matrix multiplies. GPUs are almost not high bandwidth enough.

socketcluster · 2025-12-23T08:53:58 1766480038

But even so, for a single user, the output rate for a very fast LLM would be like 100 tokens per second. With graphics, we're talking like 2 million pixels, 60 times a second; 120 million pixels per second for a standard high res screen. Big difference between 100 tokens vs 120 million pixels.

24 bit pixels gives 16 million possible colors... For tokens, it's probably enough to represent every word of the entire vocabulary of every major national language on earth combined.

> You have to shuffle your 800GB neural network in and out of memory

Do you really though? That seems more like a constraint imposed by graphics cards. A specialized AI chip would just keep the weights and all parameters in memory/hardware right where they are and update them in-situ. It seems a lot more efficient.

I think that it's because graphics cards have such high bandwidth that people decided to use this approach but it seems suboptimal.

But if we want to be optimal; then ideally, only the inputs and outputs would need to move in and out of the chip. This shuffling should be seen as an inefficiency; a tradeoff to get a certain kind of flexibility in the software stack... But you waste a huge amount of CPU cycles moving data between RAM, CPU cache and Graphics card memory.

djsjajah · 2025-12-23T20:56:40 1766523400

> Do you really though?

Yes.

It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register. For every token that is generated, a dense llm has to read every parameter in the model.

visarga · 2025-12-23T20:44:09 1766522649

If we did that it would be much more expensive, keeping all weights in SRAM is done by Groq for example.

Zambyte · 2025-12-23T04:01:19 1766462479

This doesn't seem right. Where is it shuffling to and from? My drives aren't fast enough to load the model every token that fast, and I don't have enough system memory to unload models to.

Legend2440 · 2025-12-23T04:15:37 1766463337

From VRAM to the tensor cores and back. On a modern GPU you can have 1-2tb moving around inside the GPU every second.

This is why they use high bandwidth memory for VRAM.

Zambyte · 2025-12-23T16:41:08 1766508068

This makes sense now, thanks!

zamadatix · 2025-12-23T04:11:05 1766463065

If you're using a MoE model like DeepSeek V3 the full model is 671 GB but only 37 GB are active per token, so it's more like running a 37 GB model from the memory bandwidth perspective. If you do a quant of that it could e.g. be more like 18 GB.

smallerize · 2025-12-23T04:07:09 1766462829

You're probably not using an 800GB model.

p1esk · 2025-12-23T04:10:51 1766463051

It is right. The shuffling is from CPU memory to GPU memory, and from GPU memory to GPU. If you don’t have enough memory you can’t run the model.

Zambyte · 2025-12-23T16:42:54 1766508174

How can I observe it being loaded into CPU memory? When I run a 20gb model with ollama, htop reports 3gb of total RAM usage.

zamadatix · 2025-12-23T17:10:18 1766509818

Think of it like loading a moving truck where:

- The house is the disk

- You are the RAM

- The truck is the VRAM

There won't be a single time you can observe yourself carrying the weight of everything being moved out of the house because that's not what's happening. Instead you can observe yourself taking many tiny loads until everything is finally moved, at which point you yourself should not be loaded as a result of carrying things from the house anymore (but you may be loaded for whatever else you're doing).

Viewing active memory bandwidth can be more complicated than it'd seem to set up, so the easier way is to just view your VRAM usage as you load in the model freshly into the card. The "nvtop" utility can do this for most any GPU on Linux, as well as other stats you might care about as you watch LLMs run.

Zambyte · 2025-12-23T22:05:17 1766527517

My confusion was on the shuffling process happening per token. If this was happening per token, it would be effectively the same as loading the model from disk every token.

p1esk · 2025-12-23T23:33:08 1766532788

The model might get loaded on every token - from GPU memory to GPU. This depends on how much of it is cached on GPU. Inputs to every layer must be loaded as well. Also, if your model doesn’t fit in GPU memory but fits in CPU memory, and you’re doing GPU offloading, then you’re also shuffling between CPU and GPU memory.

p1esk · 2025-12-23T17:39:01 1766511541

Depends on map_location arg in torch.load: might be loaded straight to GPU memory

Legend2440 · 2025-12-23T03:40:34 1766461234

Laptop manufacturers are too desperate to cash on the AI craze. There's nothing special about an 'AI PC'. It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.

>I don't want this garbage on my laptop, especially when its running of its battery!

The one bit of good news is it's not going to impact your battery life because it doesn't do any on-device processing. It's just calling an LLM in the cloud.

14113 · 2025-12-23T19:31:45 1766518305

That's not quite correct. Snapdragon chips that are advertised as being good for "AI" also come with the Hexagon DSP, which is now used for (or targeted at) AI applications. It's essentially a separate vector processor with large vector sizes.

zamadatix · 2025-12-23T04:21:05 1766463665

> It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.

"AI PC" branded devices get "Copilot+" and additional crap that comes with that due to the NPU. Despite desktops having GPUs with up to 50x more TOPs than the requirement, they don't get all that for some reason https://www.thurrott.com/mobile/copilot-pc/323616/microsoft-...

robocat · 2025-12-23T19:19:43 1766517583

Is Microsoft trying to help NPU chip makers?

When is Wintel going to finally happen?

Microsoft has roughly $102 billion in cash (+ short-term investments). Intel’s market value is approximately $176 billion.

I've never really understood why Microsoft helped Intel's bottom line over decades.

With Azure, Microsoft has even more reason to buy Intel.

marcus_holmes · 2025-12-23T04:01:41 1766462501

Doesn't this lead to a lot of tension between the hardware makers and Microsoft?

MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.

Laptop manufacturers are making laptops that can run an LLM locally, but there's no point in that unless there's a local LLM to run (and Windows won't have that because Copilot). Are they going to be pre-installing Llama on new laptops?

Are we going to see a new power user / normal user split? Where power users buy laptops with LLMs installed, that can run them, and normal folks buy something that can call Copilot?

Any ideas?

zdragnar · 2025-12-23T04:30:49 1766464249

It isn't just copilot that these laptops come with; manufacturers are already putting their own AI chat apps as well.

For example, the LG gram I recently got came with just such an app named Chat, though the "ai button" on the keyboard (really just right alt or control, I forget which) defaults to copilot.

If there's any tension at all, it's just who gets to be the default app for the "ai button" on the keyboard that I assume almost nobody actually uses.

marcus_holmes · 2025-12-23T05:27:10 1766467630

Interesting. Yeah, that'll be the argument

autoexec · 2025-12-23T04:08:29 1766462909

> MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.

MS doesn't care where your data is, they're happy to go digging through your C drive to collect/mine whatever they want, assuming you can avoid all the dark patterns they use to push you to save everything on OneDrive anyway and they'll record all your interactions with any other AI using Recall

marcus_holmes · 2025-12-23T04:14:11 1766463251

I had assumed that they needed the usage to justify the investment in the data centre, but you could be right and they don't care.

eterm · 2025-12-23T18:26:40 1766514400

It's just marketing. The laptop makers will market it as if your laptop power makes a difference knowing full well that it's offloaded to the cloud.

For a slightly more charitable perspective, agentic AI means that there is still a bunch of stuff happening on the local machine, it's just not the inference itself.

wmf · 2025-12-23T19:05:09 1766516709

Copilot is a local LLM (well SLM). https://learn.microsoft.com/en-us/windows/ai/apis/phi-silica

eleventyseven · 2025-12-23T18:51:33 1766515893

There's nothing special with what Intel has lowered the bar as an AI PC so vendors can market it. Ollama can run a 4b model plenty fine on Tiger Lake with 8gb classic RAM.

But unified memory IS truly what makes an AI ready PC. The Apple Silicon proves that. People are willing to pay the premium, and I suspect unified memory will still be around and bringing us benefits even if no one cares about LLMs in 5 years.

autoexec · 2025-12-23T03:52:03 1766461923

Even collecting and sending all that data to the cloud is going to drain battery life. I'd really rather my devices only do what I ask them to than have AI running the background all the time trying to be helpful or just silently collecting data.

Legend2440 · 2025-12-23T04:30:04 1766464204

Copilot is just ChatGPT as an app.

If you don't use it, it will have no impact on your device. And it's not sending your data to the cloud except for anything you paste into it.

dijit · 2025-12-23T20:01:35 1766520095

So, the new AI features like recall don’t exist?

Windows is going more and more into AI and embedding it into the core of the OS as much as it can. It’s not “an app”, even if that was true now it wouldn't be true for very long. The strategy is well communicated.

sandworm101 · 2025-12-23T03:59:30 1766462370

>> I'd really rather my devices only do what I ask them to

Linux hears your cry. You have a choice. Make it.

benbristow · 2025-12-23T20:25:25 1766521525

Unfortunately still loads of hurdles for most people.

AAA Games with anti-cheat that don't support Linux.

Video editing (DaVinci Resolve exists but is a pain to get up and running on many distros, KDenLive/OpenShot don't really cut it for most)

Adobe Suite (Photoshop/Lightroom specifically, and Premiere for Video Editing) - would like to see Affinity support Linux but hasn't happened so far. GIMP and DarkTable aren't really substitutions unless you pour a lot of time into them.

Tried moving to Linux on my laptop this past month, made it a month before a reinstall of Windows 11. Had issues with WiFi chip (managed to fix but had to edit config files deep in the system, not ideal), Fedora with LUKS encryption after a kernel update the keyboard wouldn't work to input the encryption key, no Windows Hello-like support (face ID). Had the most success with EndeavourOS but running Arch is a chore for most.

It's getting there, best it's ever been, but there's still hurdles.

cultofmetatron · 2025-12-23T21:13:36 1766524416

> AAA Games with anti-cheat that don't support Linux.

I really don't understand people that want to play games so badly that they are willing to install a literal rootkit on their devices. I can understand if you're a pro gamer but it feels stupid to do it otherwise.

benbristow · 2025-12-23T21:15:37 1766524537

Most of the time they're not really informed that they are. I know Valorant does (Riot Games), one I've avoided in the past because of it.

But a lot of the time it's peer-pressure for wanting to play with friends who couldn't care less.

cmxch · 2025-12-23T21:54:04 1766526844

Riot Vanguard is a popular rootkit.

grayhatter · 2025-12-23T20:50:59 1766523059

According to my friends, Arc Raders works well on linux. So it's very much, just a small selection of AAA games, so they can run anti-cheat, that probably doesn't even work. Can you name a triple a you want to play, that proton says is incompatible?

Gimp isn't a solution, sure but it works for what I need. Darktable does way more than I've ever wanted, so I can forgive it for the one time it crashed. Inkscape and blender both exceed my needs as well.

And Adobe is so user hostile, that I feel I need to call you a mean name to prove how I feel.... dummy!

Yes, I already feel bad, and I'm sorry. But trolling aside, listing applications that treat users like shit, aren't reasons to stay on the platform that also treats you like shit.

I get it, sometimes, being treated like shit is worth it because it's easier now that you're used to being disrespected. But an aversion to the effort it'd take for you to climb the learning curve of something different, isn't valid reason to help the disrespectful trash companies making the world worse, recruit more people for them to treat like trash.

Just because you use it, doesn't make it worth recommending.

benbristow · 2025-12-23T20:54:23 1766523263

I don't really PC game anymore, use my Xbox or a few older games my laptop's iGPU can handle, not at the moment anyway. Battlefield 6 is a big one recently that if I had a gaming PC set-up I'd probably want to play.

I know Adobe are... c-words, but their software is industry standard for a reason.

grayhatter · 2025-12-23T21:06:56 1766524016

> Battlefield 6 is a big one recently that if I had a gaming PC set-up I'd probably want to play.

We definitely play very different games, I wouldn't touch it if you paid me. So I'm sure we both have a bit of sample bias in our expected rates of linux compatibility. Especially since EA is another company like Adobe. Also, the internet seems to think they have a cheating problem. I wonder how bad it really is, and if it's worth the cost of the anti-cheat.

They're industry standard because they were first. Not necessarily because they were better. They do have a feature set that's near impossible to beat, not even I can pretend like they don't. I'm just saying, respect and fairness is more important to me, than content aware fill ever will be.

Also, doesn't the Adobe suite work on Linux?

benbristow · 2025-12-23T21:10:48 1766524248

I think older versions do, like CS6 through WINE.

Photoshop CC 2024 apparently works somewhat, but no GPU support and the removal tool doesn't work apparently.

https://appdb.winehq.org/objectManager.php?sClass=version&iI...

Basically, no.

sixothree · 2025-12-23T19:03:08 1766516588

Part of me is starting to think Valve is going to be the best thing to happen to Linux (in this regard) since Ubuntu.

bitwize · 2025-12-23T03:44:34 1766461474

AI PCs also have NPUs which I guess provide accelerated matmuls, albeit less accelerated than a good discrete GPU.

Legend2440 · 2025-12-19T22:03:52 1766181832

It’s not an LLM, but it is genAI. It’s based on the same idea of predict-the-next-thing, but instead of predicting words it predicts the next state of the atmosphere from the current state.

adamweld · 2025-12-19T22:24:53 1766183093

It is in fact one of the least generalized forms of "AI" out there. A model focused solely on predicting weather.

astrange · 2025-12-19T22:30:21 1766183421

"gen" stands for "generative". If you read the GenCast paper they call it a generative AI - IIRC it's an autoregressive GNN plus a diffusion model.

Which is surprising to me because I didn't think it would work for this; they're bad at estimating uncertainty for instance.

Majromax · 2025-12-20T00:46:34 1766191594

> Which is surprising to me because I didn't think it would work for this; they're bad at estimating uncertainty for instance.

FGN (the model that is 'WeatherNext 2'), FourCastNet 3 (NVIDIA's offering), and AIFS-CRPS (the model from ECMWF) have all moved to train on whole ensembles, using a cumulative ranked probability score (CRPS) loss function. Minimizing the CRPS minimizes the integrated square differences of the cumulative density function between the prediction and truth, so it's effectively teaching the model to have uncertainty proportional to its expected error.

GenCast is a more classic diffusion-based model trained on a mean-squared-error-type loss function, much like any of the image diffusion models. Nonetheless it performed well.

Legend2440 · 2025-12-19T21:15:45 1766178945

The trouble is that english doesn’t fit neatly into any of these categories. It has features that make it at least a context-free language, but can’t handle other features of context-free languages like unlimited nesting.

Ultimately these are categories of formal languages, and natural language is an entirely different kind of thing.

nabla9 · 2025-12-19T21:42:14 1766180534

Strictly speaking natural languages fit into Context-Sensitive (Type-1) in Chomsky Hierarchy, but that's too broad to be useful.

In practice they are classified into MCSL (Mildly Context-Sensitive) subcategory defined by Aravind K. Joshi.

krnlclnl · 2025-12-19T22:09:52 1766182192

Sure, if you accept and agree with Joshi.

No reason to do that though, except to validate some random persons perspective on language. The sky will not open and smash us with a giant foot if we reject such an obligation.

nabla9 · 2025-12-20T08:48:37 1766220517

Natural languages being in MCSL (Mildly Context-Sensitive) is the consensus among linguistics, not some random individual's viewpoint.

krnlclnl · 2025-12-21T02:38:26 1766284706

OK? What concrete human problems human biology faces are resolved by this groups consensus? Obsession with notation does little to improve crop yields, or improve working conditions for the child labor these academic geniuses rely on.

Sure, linguists, glad you found some semantics that fit your obsession. Happy for you!

Most people will never encounter their work and live their lives never knowing such an event happened.

suddenlybananas · 2025-12-20T11:20:31 1766229631

You can also reject quantum physics and the sky will not open and smash us with a giant foot. However, to do so without serious knowledge of physics would be quite dumb.

krnlclnl · 2025-12-21T02:30:29 1766284229

Apples and oranges. Language emerges from human biology which emerges from the physical realm. In the end language emerges then from the physical realm. Trying to de-couple it from physical nature and make it an abstract thought bubble is akin to bike shedding in programming.

suddenlybananas · 2025-12-22T14:12:10 1766412730

You could say this about literally anything.

Legend2440 · 2025-12-19T00:17:26 1766103446

Computers are already cheap enough to put in single-use pregnancy tests, what more do you want.

nineteen999 · 2025-12-21T08:07:08 1766304428

I want to play Doom on a pregnancy test!

https://www.reddit.com/r/gaming/comments/ncmegl/doom_running...

Just imagine a Beowulf cluster of these.

Legend2440 · 2025-12-18T02:25:56 1766024756

> TikTok was only able to receive this information with the help of the Israeli data company AppsFlyer and Grindr itself.

So basically, the TikTok app is not spying on your dating apps - your dating apps are willingly selling your information to them, through intermediaries.

This means uninstalling tiktok won’t help. And worse, many other companies are getting your dating info too.

Animats · 2025-12-18T02:45:23 1766025923

Grindr had a big data "leak" in 2024.[1] Not a "leak", really, just ordinary reselling of people's gay and HIV status. In 2025, a data broker who resold Grindr data also had a big breach. That wasn't Grindr-specific - it included Temple Run, Subway Surfers, Tinder, Grindr, MyFitnessPal, Candy Crush, Truecaller, 9GAG, Microsoft 365, and others. But not TikTok, because TikTok monetizes that info themselves.

[1] https://thehill.com/business/4614940-grindr-sold-hiv-status-...

[2] https://www.pcmag.com/news/major-data-broker-leak-might-have...

Legend2440 · 2025-12-16T03:59:53 1765857593

Adblockers are still working fine though? I’m on chrome with ublock and I’m not seeing any ads.

anonym29 · 2025-12-16T04:07:47 1765858067

you're not using ublock, you're using ublock lite. it cannot do dynamic filtering, script blocking, or url parameter removal, among other limitations.

charcircuit · 2025-12-16T06:42:18 1765867338

Why does that matter if he's not seeing ads. A severely crippled adblocker means that you would see ads during regular usage.

Additionally, Brave a chromium based browser has adblocking built into the browser itself meaning it is not affected by webextention changes and does not require trusting an additional 3rd party.

ozgrakkurt · 2025-12-16T07:44:06 1765871046

Tracking is also very important. Blocking scripts is very useful

Legend2440 · 2025-12-14T20:14:32 1765743272

>the present generation of automated systems, which are monitored by former manual operators, are riding on their skills, which later generations of operators cannot be expected to have.

But we are in the later generation now. All the 1983 operators are now retired, and today's factory operators have never had the experience of 'doing it by hand'.

Operators still have skills, but it's 'what to do when the machine fails' rather than 'how to operate fully manually'. Many systems cannot be operated fully manually under any conditions.

And yet they're still doing great. Factory automation has been wildly successful and is responsible for why manufactured goods are so plentiful and inexpensive today.

gmueckl · 2025-12-14T20:18:50 1765743530

It's not so simple. The knowledge hasn't been transferred to future operators, but to process engineers who are kow in charge of making the processes work reliably through even more advanced automation that requires more complex skills and technology to develop and produce.

Legend2440 · 2025-12-14T20:22:47 1765743767

No doubt, there are people that still have knowledge of how the system works.

But operator inexperience didn't turn out to be a substantial barrier to automation, and they were still able to achieve the end goal of producing more things at lower cost.

Legend2440 · 2025-12-14T19:13:59 1765739639

Google made some very large ngram models around twenty years ago. This being before the era of ultra-high-speed internet, it was distributed as a set of 6 DVDs.

It achieved state-of-the-art performance at tasks like spelling correction at the time. However, unlike an LLM, it can't generalize at all; if an n-gram isn't in the training corpus it has no idea how to handle it.

https://research.google/blog/all-our-n-gram-are-belong-to-yo...

tqian · 2025-12-15T04:10:36 1765771836

I have this DVD set in my basement. Technically, there are still methods for estimating the probability of unseen ngrams. Backoff (interpolating with lower grams) is an option. You can also impose prior distributions like a Bayesian so that you can make "rational" guesses.

Ngrams are surprisingly powerful for how little computation they require. They can be trained in seconds even with tons of data.