I've been thinking a lot about this, and I want to build the following experimen...

I've been thinking a lot about this, and I want to build the following experiment, in case anyone is interested:

The experiment is about putting an LLM to play plman[0] with and without prolog help.

plman is a pacman like game for learning prolog, it was written by profesor Francisco J. Gallego from Alicante University to teach logic subject in computer science.

Basically you write solution in prolog for a map, and plman executes it step by step so you can see visually the pacman (plman) moving around the maze eating and avoiding ghost and other traps.

There is an interesting dynamic about finding keys for doors and timing based traps.

There are different levels of complexity, and you can also write easily your maps, since they are just ascii characters in a text file.

I though this was the perfect project to visually explain my coworkers the limit of LLM "reasoning" and what is symbolic reasoning.

So far I hooked ChatGPT API to try to solve scenarios, and it fails even with substancial amount of retries. That's what I was expecting.

The next thing would be to write a mcp tool so that the LLM can navigate the problem by using the tool, but here is where I need guidance.

I'm not sure about the best dynamic to prove the usefulness of prolog in a way that goes beyond what context retrieval or db query could do.

I'm not sure if the LLM should write the prolog solution. I want to avoid to build something trivial like the LLM asking for the steps, already solved, so my intuition is telling me that I need some sort of virtual joystick mcp to hide prolog from the LLM, so the LLM could have access to the current state of the screen, and questions like what would be my position if I move up ? What's the position of the ghost in next move ? where is the door relative to my current position ?

I don't have academic background to design this experiment properly. Would be great if anyone is interested to work together on this, or give me some advice.

Prior work pending on my reading list:

- LoRP: LLM-based Logical Reasoning via Prolog [1]

- A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models [2]

- [0] https://github.com/Matematicas1UA/plman/blob/master/README.m...

- [1] https://www.sciencedirect.com/science/article/abs/pii/S09507...

- [2] https://arxiv.org/html/2411.18564v1