Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ah gotcha! In that case, I think Terminal-Bench is currently the best proxy for "how good is this harness+agent combo at coding (quantitatively)" question. I think it used to be SWE-Bench, but I think T-Bench is a better proxy for this now. Like you said though, unfortunately Cursor isn't listed (probably their choice to not list it, maybe because it doesn't place highly).




Alright, I will try out Letta Code manually later then.

Cool, let us know what you think! Would recommend trying w/ Sonnet/Opus 4.5 or GPT-5.2 (those are the daily drivers we use internally w/ Letta Code)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: