The question is more whether LLMs can accurately report their internal operation... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		roywiggins 11 days ago \| parent \| context \| favorite \| on: Training LLMs for honesty via confessions The question is more whether LLMs can accurately report their internal operations, not whether any of that counts as "thinking." Simple algorithms can, eg, be designed to report whether they hit an exceptional case and activated a different set of operations than usual.

BaconVonPork 11 days ago [–]

That's basically a variant of the halting problem and what you hope to get is a supervisor responding. If people expected this I don't think they would be as confused about the difference between statistical analysis of responses requiring emotions to be convincing and an LLM showing atonement.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact