That's basically a variant of the halting problem and what you hope to get is a supervisor responding. If people expected this I don't think they would be as confused about the difference between statistical analysis of responses requiring emotions to be convincing and an LLM showing atonement.
Simple algorithms can, eg, be designed to report whether they hit an exceptional case and activated a different set of operations than usual.