Give coding agent some software. Ask it to write tests that maximise code coverage (source coverage if you have source code; if not, binary coverage). Consider using concolic fuzzing. Then give another agent the generated test suite, and ask it to write an implementation that passes. Automated software cloning. I wonder what results you might get?
> Ask it to write tests that maximise code coverage
That is significantly harder to do than writing an implementation from tests, especially for codebases that previously didn't have any testing infrastructure.
Give a coding agent a codebase with no tests, and tell it to write some, it will - if you don’t tell it which framework to use, it will just pick one. No denying you’ll get much better results if an experienced developer provides it with some prompting on how to test than if you just let it decide for itself.
If you’ve actually tried this, and actually read the results, you’d know this does not work well. It might write a few decent tests but get ready for an impressive number of tests and cases but no real coverage.
I did this literally 2 days ago and it churned for a while and spit out hundreds of tests! Great news right? Well, no, they did stupid things like “Create an instance of the class (new MyClass), now make sure it’s the right class type”. It also created multiple tests that created maps then asserted the values existed and matched… matched the maps it created in the test… without ever touching the underlying code it was supposed to be testing.
I’ve tested this on new codebases, old codebases, and vibe coded codebases, the results vary slightly and you absolutely can use LLMs to help with writing tests, no doubt, but “Just throw an agent at it” does not work.
But, did you actually give the agent access to a tool to measure code coverage?
If it can't measure whether it is succeeding in increasing code coverage, no wonder it doesn't do that great a job in increasing it.
Also, it can help if you have a pair of agents (which could even be just two different instances of the same agent with different prompting) – one to write tests, and one to review them. The test-writing agent writes tests, and submits them as a PR; the PR-reviewing agent read the PR and provides feedback; the test-writing agent updates the tests in response to the feedback; iterate until the PR-reviewing agent is satisfied. This can produce much better tests than just an agent writing tests without any automated review process.
This highlights something that I wish was more prevalent, Path Coverage. I'm not sure of what testing suites handle path coverage, but I know XDebug for PHP could manage it back when I was doing PHP work. Simple line coverage doesn't tell you enough of the story while path coverage should let you be sure you've tested all code paths of a unit. Mix that with input fuzzing and you should be able to develop comprehensive unit tests for critical units in your codebase. Yes, I'm aware that's just one part of a large puzzle.