Nice! Though for older hardware it would be nice if the price reflected the current second hand market (harder to get data for, I know). Eg. Nvidia RTX 3070 ranks as second best GPU in tok/s/$ even at the MSRP of $499. But you can get one for half that now.
It seems like verification might need to be improved a bit? I looked at Mistral-Large-123B. Someone is claiming 12 tokens/sec on a single RTX 3090 at FP16.
Perhaps some filter could cut out submissions that don't really make sense?
Each sub-agent is executed as a separate CLI invocation (e.g. Cursor CLI or Claude Code), which means it gets a fresh model context window. The isolation is purely at the LLM context level, not process or filesystem isolation.
The main agent passes only minimal inputs (file paths, task instructions), gets a concise result back, and keeps its own context clean.
reply