Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find it really interesting that it uses a Mamba hybrid with Transformers. Is it the only significant model right now using (at least partially) SSM layers? This must contribute to lower VRAM requirements right? Does it impact how KV caching works?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: