Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
radarsat1
7 hours ago
|
parent
|
context
|
favorite
| on:
Nvidia Nemotron 3 Family of Models
I find it really interesting that it uses a Mamba hybrid with Transformers. Is it the only significant model right now using (at least partially) SSM layers? This must contribute to lower VRAM requirements right? Does it impact how KV caching works?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: