The funny part of “AI will make formal verification go mainstream” is that it sk...

TimTheTinker · 2025-12-17T17:03:24 1765991004

> it skips over the one step the industry still refuses to do: decide what the software is supposed to do in the first place.

Not only that, but it's been well-established that a significant challenge with formally verified software is to create the right spec -- i.e. one that actually satisfies the intended requirements. A formally verified program can still have bugs, because the spec (which requires specialized skills to read and understand) may not satisfy the intent of the requirements in some way.

So the fundamental issue/bottleneck that emerges is the requirements <=> spec gap, which closing the spec <=> executable gap does nothing to address. Translating people's needs to an empirical, maintainable spec of one type or another will always require skilled humans in the loop, regardless of how easy everything else gets -- at minimum as a responsibility sink, but even more as a skilled technical communicator. I don't think we realize how valuable it is to PMs/executives and especially customers to be understood by a skilled, trustworthy technical person.

suspended_state · 2025-12-17T18:22:48 1765995768

> A formally verified program can still have bugs, because the spec (which requires specialized skills to read and understand) may not satisfy the intent of the requirements in some way.

That's not a bug, that's a misunderstanding, or at least an error of translation from natural language to formal language.

Edit:

I agree that one can categorize incorrect program behavior as a bug (apparently there's such a thing as "behavioral bug"), but to me it seems to be a misnomer.

I also agree that it's difficult to tell that to a customer when their expectations aren't met.

gls2ro · 2025-12-17T18:35:37 1765996537

In some definitions (that I happen to agree with but because we wanted to save money by first not properly training testers and then getting rid of them is not present so much in public discourse) the purpose of testing (or better said quality control) is:

1) Verify requirements => this can be done with formal verifications

2) Validate fit for purpose => this is where we make sure that if the customer needs addition it does not matter if our software does very well substraction and it has a valid proof of doing that according with specs.

I know this second part is kinda lost in the transition from oh my god waterfall is bad to yeyy now we can fire all testers because the quality is the responsibility of the entire team.

AlienRobot · 2025-12-18T16:58:30 1766077110

>an error of translation from natural language to formal language

Really? Programming languages are all formal languages, which means all human-made errors in algorithms wouldn't be "bugs" anymore. Some projects even categorize typos as bugs, so that's a unusually strict definition of "bug" in my opinion.

suspended_state · 2025-12-20T16:35:44 1766248544

Sure, I guess you can understand what I said that way, but that's not what I meant. I wasn't thinking about the implementation, but the specifications.

Read again the quote I was refering to if you need better context to understand my comment.

If you have good formal specifications, you should be able to produce the corresponding code. Any error in that phase should be considered a bug, and yes, a typo should fit that category, if it makes the code deviate from the specs.

But an error in the step of translating the requirements (usually explained in natural language) to specifications (usually described formally) isn't a bug, it's a translation error.

jandrese · 2025-12-17T18:17:38 1765995458

The danger of this is people start asking about formally verified specs, and down that road lies madness.

"If you can formally verify the spec the code can be auto-generated from it."

TimTheTinker · 2025-12-17T22:42:30 1766011350

Most formal "specs" (the part that defines the system's actual behavior) are just code. So a formally verified (or compiled) spec is really just a different programming language, or something layered on top of existing code. Like TypeScript types are a non-formal but empirical verification layer on top of JavaScript.

The hard part remains: translating from human-communicated requirements to a maintainable spec (formally verified or not) that completely defines the module's behavior.

strbean · 2025-12-17T18:10:30 1765995030

> decide what the software is supposed to do in the first place.

That's where the job security is (and always has been). This has been my answer to "are you afraid for your job because of AI?"

Writing the code is very rarely the hard part. The hard part is getting a spec from the PM, or gathering requirements stakeholders. And then telling them why the spec / their requirements don't make sense or aren't feasible, and figuring out ones that will actually achieve their goals.

svat · 2025-12-17T23:16:44 1766013404

There are some basic invariants like "this program should not crash on any input" or "this service should be able to handle requests that look like X up to N per second" — though I expect those will be the last to be amenable to formal verification, they are also very simple ones that (when they become possible) will be easy to write down.

yencabulator · 2025-12-18T18:25:43 1766082343

> "this program should not crash on any input" [...] though I expect those will be the last to be amenable to formal verification,

In the world of Rust, this is actually the easiest to achieve level of formal proofs.

Simple lints can eliminate panics and potentially-panicking operations (forcing you/LLM to use variants with runtime error handling, e.g. `s[i]` can become `s.get(i).unwrap_or(MyError::RuhRoh)?`, or more purpose-specific handling; same thing for e.g. enforcing that arithmetic never underflows/overflows).

Kani symbolically evaluates simple Rust functions and ensures that the function does not panic on any possible value on it's input, and on top of that you can add invariants to be enforced (e.g. search for an item in an array always returns either None or a valid index, and the value at that index fulfills the search criteria).

(The real challenge with e.g. Kani is structuring a codebase such that it has those simple-enough subparts where formal methods are feasible.)

Verdex · 2025-12-17T17:58:48 1765994328

Yeah, the hyper majority of the history of "getting things done" has been: find some guy who can translate "make the crops grow" into a pile of food.

The people who care about the precise details have always been relegated to a tiny minority, even in our modern technological world.

anovick · 2025-12-17T18:25:27 1765995927

OP seems not broadly applicative to corporate software development.

Rather, it's directed at the kind of niche, mission-critical things, that not all of which are getting the formal verification solution that is needed for them and/or that don't get considered due to high costs (due to specialization skill).

I read OP as a realization that the costs have fallen, and thus we should see formal verification more than before.

fulafel · 2025-12-18T04:39:49 1766032789

This is the article's message as well:

"That doesn’t mean software will suddenly be bug-free. As the verification process itself becomes automated, the challenge will move to correctly defining the specification: that is, how do you know that the properties that were proved are actually the properties that you cared about? Reading and writing such formal specifications still requires expertise and careful thought. But writing the spec is vastly easier and quicker than writing the proof by hand, so this is progress."

General security properties come to mind as one area that could have good reusability for specs.

fsloth · 2025-12-17T20:38:43 1766003923

"decide what the software is supposed to do in the first place."

After 20 years of software development I think that is because most of the software out there, is the method itself of finding out what it's supposed to do.

The incomplete specs are not lacking feature requirements due to lack of discipline. It's because nobody can even know without trying it out what the software should be.

I mean of course there is a subset of all software that can be specified before hand - but a lot of it is not.

Knuth could be that forward thinking with TeX for example only because he had 500 years of book printing tradition to fall back on to backport the specs to math.