> We write unit tests for the happy path, maybe a few edge cases we can imagine, but what about the inputs we'd never consider? Many times we assume that LLMs are handling these scenarios by default,
I've seen companies advertise with LLM generated claims (~Best company for X according to ChatGPT), I've seen (political) discussions being held with LLM opinions as "evidence".
So it's pretty safe to say some (many?) attribute inappropriate credence to LLM outputs. It's eating our minds.
The original claim for TDD is your write tests for all your edge cases. It doesn't matter about inputs you didn't consider because they are covered in the edge. If you can only accept inputs from 2-7 (inclusive) you check 1,2,7,8 - if those pass you assume the rest work.
Since I work in a strongly typed languages the last two will fail to compile and are thus not worth the bother - those who don't have that luxury of course need to test the edge cases that apply to them. The first are maybe, in my experience they are rarely a problem, but we need to go from the abstract to the particular algorithm before we can have a discussion on if they are potentially a problem or not.
What’s interesting to me about this, reckless as it is, is that the conversation has begun to shift toward balancing LLMs with rigorous methods. These people seem to be selling some kind of AI hype product backed by shoddy engineering, and even they are picking up on the vibe. I think this is a really promising sign for the future.
Do we?