I find it really interesting that it uses a Mamba hybrid with Transformers. Is i...

		radarsat1 7 hours ago \| parent \| context \| favorite \| on: Nvidia Nemotron 3 Family of Models I find it really interesting that it uses a Mamba hybrid with Transformers. Is it the only significant model right now using (at least partially) SSM layers? This must contribute to lower VRAM requirements right? Does it impact how KV caching works?