Very interesting release: \* Hybrid MoE: 2-3x faster than pure MoE transformers ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		red2awn 1 day ago \| parent \| context \| favorite \| on: Nvidia Nemotron 3 Family of Models Very interesting release: * Hybrid MoE: 2-3x faster than pure MoE transformers * 1M context length * Trained on NVFP4 * Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...) * Open model training recipe (coming soon) Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0. Also interesting that the model is trained in NVFP4 but the inference weights are FP8.

bcatanzaro 1 day ago [–]

The Nano model isn’t pretrained in FP4, only Super and Ultra are. And posttraining is not in FP4, so the posttrained weights of these models are not native FP4.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact