The discussion has become contentious and that's very unfortunate because there'...

The discussion has become contentious and that's very unfortunate because there's clearly some confusion about Prolog and that's always a great opportunity to learn.

You say:

>> Yes, though, it traverses the code tree by depth first walk.

Here's what I suggest: try to think what, exactly, is the data structure searched by Depth First Search during Prolog's execution.

You'll find that this structure is what we call and SLD-Tree. That's a tree where the root is a Horn goal that begins the proof (i.e. the thing we want to dis-prove, since we're doing a proof by refutation); every other node is a new goal derived during the proof; every branch is a Resolution step between one goal and one definite program clause from a Prolog program; and every leaf of a finite branch is either the empty clause, signalling the success of the proof by refutation, or a non-empty goal that can not be further reduced, which signals the failure of the proof. So that's basically a proof tree and the search is ... a proof.

So Prolog is not just searching a list to find an element, say. It's searching a proof tree to find a proof. It just so happens that searching a proof tree to find a proof corresponds to the execution of a program. But while you can use a search to carry out a proof, not every search is a proof. You have to get your ducks in a row the right way around otherwise, yeah, all you have is a search. This is not magick, it's just ... computer science.

It should go without saying that you can do the same thing with Python, or with javascript, or with any other Turing-complete language, but then you'd basically have to re-invent Prolog, and implement it in that other language; an ad-hoc, informally specified, bug-ridden and slow implementation of half of Prolog, most like.

This is all without examining whether you can fix LLMs' lack of reasoning by funneling their output through a Prolog interpreter. I personally don't think that's a great idea. Let's see, what was that soundbite... "intelligence is shifting the test part of generate-test into the generate part" [1]. That's clearly not what pushing LLM output into a Prolog interpreter achieves. Clearly, if good, old-fashioned symbolic AI has to be combined with statistical language modelling, that has to happen much earlier in the statistical language modelling process. Not when it's already done and dusted and we have a language model; which is only statistical. Like putting the bubbles in the soda before you serve the drink, not after, the logic has to go into the language modelling before the modelling is done, not after. Otherwise there's no way I can see that the logic can control the modelling. Then all you have is generate-and-test, and it's meh as usual. Although note that much recent work on carrying out mathematical proofs with LLMs does exactly that, e.g. like DeepMind's AlphaProof. Generate-and-test works, it's just dumb and inefficient and you can only really make it work if you have the same resources as DeepMind and equivalent.

_____________

[1] Marvin Minsky via Rao Kampampathi and students: https://arxiv.org/html/2504.09762v1