Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I see remarks like this a lot, and I don't know what to do with them.

There's no "reasoning loop" built into LLMs yet. Keyword, yet. For now we're left with single-shot answers from "memory" rather than a reasoning loop akin to what a human would do, which is read the docs, try some stuff out, discover that what you're asking for is impossible, and then telling you that it isn't possible.



Alternatively, the prior on "this is not possible" is very low because RLHF & Friends have targeted metrics that, inadvertently or not, discourage that outcome.


I think that's the right answer - human trainers prefer an answer, even a made up one, to "I don't know".


Dataset as well. In a forum if you don't know the answer you simply don't post. Only people who think they know will post an answer. In a dialogue you see a lot more "I don't know" since there they are expected to respond, but there isn't a lot of dialogue data to be found on the internet compared to open forum data.


Amazon product Q&A has a lot of "I don't know" answers. Unlike just about everywhere else on the internet.


Any agent that uses a tool (ie. repl/apis) essentially has a reasoning loop just like this. Microsoft Research recently got their Autogen framework up to point where the results are much better than AutoGPT/BabyGPT and the like through more optimal tool use, showing the ability to do rather sophisticated research and problem solving.

This new research focuses on responses just using its own corpus of knowledge. And this totally makes sense if that's all you do without giving a hint to what is incorrect, if indeed anything is incorrect. It's akin to asking a child after they produce an answer they feel sure about an "are you sure?", and then receiving another answer out of a state of confusion.

The main takeaway for me is if you can somehow get better performance on followup queries from simply telling it to more carefully review its approach, that your initial prompt wasn't as efficient as it could be.


It's trivially easy to build a reasoning loop using the GPT-4 API.


How?


example:

Initial Prompt:

"Here is the schema for a database: CREATE TABLE persons ( id INTEGER PRIMARY KEY AUTOINCREMENT, first_name TEXT NOT NULL, last_name TEXT NOT NULL, age INTEGER NOT NULL ); CREATE TABLE Y(blah blah blah)

I'll pose a question and I want you to: Respond with JSON which has two fields, Query and Error. Query: A SQL query to get the required information Error: An error message to pass back to the user if it's not possible.

Question: Show me all the people who are over 40. "

Response from prompt: {Query: "Select years_old from persons where age > 40", Error:""}

Now, in your "agent": Get that response, run it against your db, get the error message. Go back to the GPT-4 api with the initial prompt and response, and add "this doesn't work and gives the following error message. Correct your response and responde in JSON again."

And so on.


People who are having trouble getting good results from LLMs are still trying to figure them out. Most folks are using the product and not the model. Where it really shines is when you use the API and build a platform of workflows around it, starting with a real solid prompt template and working your way through various methods to achieve an outcome. There is no magic wand to get a full solution in one shot, but guiding the LLM towards that ultimate outcome is the type of thing a lot of nerds dont have the attention span for and instead move on to the next weekly JS framework to get their fix.

Which is good for us who stick with the process.


Yup, agree.


These models produce correct answers to many problems that require “reasoning” (for any sensible meaning of the word), that are not in their training set.


It’s also unclear that LLMs have no “reasoning loop”, or that a “loop” abstraction is necessary for all reasoning, or that eyeballing wrong answers is a sufficient metric to categorically dismiss “reasoning." A "reasoning loop" argument is especially odd when applied to LLMs...which explicitly have architectural loops, and their output generation is a literal loop.

Folks, we don't understand our own minds. You think we already understand a potential alien mind? With our n = 1 examples of generally intelligent mind architectures (ours)? Fat chance.

Anyone confidently claiming sweeping, nebulous conclusions about these new models is likely revealing more about their biases than the model's inner workings. We just don't know much. Hot takes on ML are just anthropocentrism Rorschach tests.


> A "reasoning loop" argument is especially odd when applied to LLMs...which explicitly have architectural loops

Oh, I understood that the current crop of LLMs didn't have a way to push data back into itself. I know so little about LM architecture at this point. I need to work my way through the freeAI course that's out there. Maybe a tutorial or two on building one from scratch.


That just shows that parroting will solve a number of problems that require reasoning, that is "reasoning" as we think of it can be reduced to a statistical process.

Kids parrot their parents before they understand the meaning of what they are doing or saying. Meaning arises from a much more complex process than seeing/repeating. I think the same will be true with LLMs. True reasoning capabilities will be another revolution entirely.

---

This just occurred to me; LLM's have a cargo cult level understanding of anything they've been trained on. Correct answers are actually statistical flukes -- purposeful because that's how we trained the models -- but not actually significant in terms of reasoning.


How do we test this alleged distinction between "true reasoning" and statistical parroting? What experiments can we perform on SotA models to make this idea falsifiable?

Commonplace hypotheses like these give me strong "No True Scotsman" vibes, but I'm often wrong. Let's agree on a method, and I'll test it.


There is none. No one understands what intelligence really means or self-awareness. Which is why they feel threatened by anything that challenges their self proclaimed uniquely human trait. The goal posts WILL move a lot more before this debate is settled. Inevitably the success rate and accuracy of LLM responses is going to get better and better, quite possibly to the point where it is indistinguishable from a human. But it wont matter, because some humans will redefine intelligence to be something else so their place at the top of the intelligence pyramid remains in place.


You can coax LLM's into reasoning. I recall someone on here posting a link to prompts that instruct the LLM to reason through a request, and this improves its output significantly.

I think what our models are missing is recursion on themselves. You and I can be self-referential, and we are capable of meta thought. We are also capable of "internal dialogue" where we speak and reason internally.

The LLMs at present lack even state or memory. Arguably those aren't necessary for "reasoning" capabilities.

I wonder how far off "stochastic parrot" is from what we do naturally. I have an image in my head of how we think using associated words/concepts/pictures for learning and I can't imagine it is too different from statistically associated concepts/words.

---

This is sort of scatterbrained, and I apologize. I don't have enough time to write a more concise response.


That doesn’t mean it does any “reasoning”. It generates a response text that looks like a response to the input text. The whole point is that it generates responses that aren’t in the training set. But the fact that the response is actually a correct one is just coincidence.


So when a model succeeds on a reasoning task it's not positive evidence, but when it fails it's negative evidence? The good ol' confirmation bias feedback loop!


The article is about work to add a loop somewhat to that effect




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: