What stood out to me is how much of gpt-oss’s “newness” isn’t about radical arch...

regularfry · 2025-08-11T08:50:21 1754902221

> will we start seeing a formal split in open-weight model development—specialized “reasoners” that rely on tool use for facts, and “knowledge bases” tuned for retrieval-heavy work?

My bet's on the former winning outright. It's very hard to outrun a good search engine, LLMs are inherently lossy so internal recall will never be perfect, and if you don't have to spend your parameter budget encoding information then you get to either spend that budget on being a much better reasoner, or you shrink the model and make it cheaper to run for the same capability. The trade-off is a more complex architecture, but that's happening anyway.

asabla · 2025-08-11T06:17:28 1754893048

> that rely on tool use for facts, and “knowledge bases” tuned for retrieval-heavy work

I would say this isn't exclusive to the smaller OSS models. But rather a trait of Openai's models all together now.

This becomes especially apparent with the introduction of GPT-5 in ChatGPT. Their focus on routing your request to different modes and searching the web automatically (relying on an Agentic workflows in the background) is probably key to the overall quality of the output.

So far, it's quite easy to get their OSS models to follow instructions reliably. Qwen models has been pretty decent at this too for some time now.

I think if we give it another generation or two, we're at the point of having compotent enough models to start running more advanced agentic workflows. On modest hardware. We're almost there now, but not quite yet

codelion · 2025-08-11T06:43:07 1754894587

It is by design. OpenAI is not going to reveal any architectural innovation they have made in their own commercial models.

diggan · 2025-08-11T07:43:50 1754898230

Maybe not a architectural innovation, but both the Harmony format and splitting things into system/developer/user messages instead of just system/user messages, are both novel (in the released weights world) and different enough that I'm still in the process of updating my libraries so I can run fair benchmarks...

ethan_smith · 2025-08-11T09:00:18 1754902818

MXFP4's mixed precision approach (4-bit for weights, higher precision for KV cache) actually offers better accuracy/size tradeoffs than competing quantization methods like GPTQ or AWQ, which is why it enables these impressive resource profiles without the typical 4-bit degradation.

littlestymaar · 2025-08-11T07:07:46 1754896066

> careful layering of well-understood optimizations—RoPE, SwiGLU, GQA, MoE

They basically cloned Qwen3 on that, before adding the few tweaks you mention afterwards.

sailingparrot · 2025-08-11T08:19:55 1754900395

You seem to be conflating when you first heard about those techniques and when they first appeared. None of those techniques were first seen in Qwen, nor this specific combination of techniques.

NitpickLawyer · 2025-08-11T07:38:30 1754897910

> They basically cloned Qwen3 on that

Oh, come on! GPT4 was rumoured to be an MoE well before Qwen even started releasing models. oAI didn't have to "clone" anything.

littlestymaar · 2025-08-11T11:45:57 1754912757

First, it would be great if people stopped top acting as if those billion-dollar corporations where sport teams.

Second, I don't claim OpenAI have to clone anything, and I have no reason to believe that their proprietary models are copying other people's ones. But for this particular open weight models, they clearly have an incentive to use exactly the same architectural base as another actor's, in order to avoid leaking too much information about their own secret sauce.

And finally, though GPT-4 was a MoE it was most likely what TFA calls “early MoE” with a few very big experts, not many small ones.