Worth noting here for others following that a single forward pass is what genera...

sailingparrot · 2025-10-21T03:58:04 1761019084

You are forgetting about attention on the kv-cache, which is the mechanism that allows LLM to not start anew everytime.

nl · 2025-10-21T04:38:27 1761021507

I mean sure, but that is a performance optimization that doesn't really change what is going on.

It's still recalculating, just that intermediate steps are cached.

sailingparrot · 2025-10-21T11:31:29 1761046289

Isn't the ability to store past reasoning in an external system to avoid having to do the computation all over again precisely what a memory is though?

But mathematically KV-caching, instead of doing prefilling at every token is equivalent, sure. But the important part of my message was the attention.

A plan/reasoning made during the forward pass of token 0 can be looked at by subsequent (or parallel if you don’t want to use the cache) passes of token 1,…,n. So you cannot consider token n to be starting from scratch in terms of reasoning/planning as it can reuse what has already been planned in previous tokens.

If you think about inference with KV-caching, even though you are right that mathematically it’s just an optimization, it makes this behavior much more easy to reason about: the kv-cache is a store of past internal states, that the model can attend to for subsequent tokens, which allows that subsequent token internal hidden states to be more than just a repetition of what the model already reasoned about in the past.