* Hybrid MoE: 2-3x faster than pure MoE transformers
* 1M context length
* Trained on NVFP4
* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)
* Open model training recipe (coming soon)
Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.
Also interesting that the model is trained in NVFP4 but the inference weights are FP8.
reply
* Hybrid MoE: 2-3x faster than pure MoE transformers
* 1M context length
* Trained on NVFP4
* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)
* Open model training recipe (coming soon)
Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.
Also interesting that the model is trained in NVFP4 but the inference weights are FP8.