Ah gotcha! In that case, I think Terminal-Bench is currently the best proxy for ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		pacjam 8 days ago \| parent \| context \| favorite \| on: Letta Code Ah gotcha! In that case, I think Terminal-Bench is currently the best proxy for "how good is this harness+agent combo at coding (quantitatively)" question. I think it used to be SWE-Bench, but I think T-Bench is a better proxy for this now. Like you said though, unfortunately Cursor isn't listed (probably their choice to not list it, maybe because it doesn't place highly).

koakuma-chan 8 days ago [–]

Alright, I will try out Letta Code manually later then.

pacjam 8 days ago | [–]

Cool, let us know what you think! Would recommend trying w/ Sonnet/Opus 4.5 or GPT-5.2 (those are the daily drivers we use internally w/ Letta Code)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact