ChaptersEventsBlog

What if AI Knew When to Say “I Don’t Know”?

Published 01/21/2026

What if AI Knew When to Say “I Don’t Know”?
Written by Kurt Seifried, Chief Innovation Officer, CSA.

Not a vocabulary problem. AI models can produce uncertainty language just fine, “I’m not sure,” “This may not be accurate,” “Please verify.” They say these things constantly. Sometimes appropriately. Often not.

The problem is knowing when it’s warranted.

You can prompt AI to justify its answers. Ask for chain of thought. Request confidence levels. And it will comply, produce reasoning steps, attach probability estimates, show its work. But this is performance on demand, not intrinsic capability. The model doesn’t track the epistemic status of its claims as a first-class thing. It generates a confidence story because you asked.

To be clear, modern language models do have internal token-level probabilities. They can be very confident that the next word is likely. That is not the same as being confident in the idea, the claim, or the whole argument. A paragraph can be composed of individually likely tokens and still be globally wrong. What’s missing is higher-order, persistent epistemic state, “What do I actually have grounds for,” across multiple interdependent claims.

That’s not a quirk. That’s a risk management problem.

 

The epistemology problem hiding in plain sight

Epistemology asks a deceptively simple question: What’s the difference between believing something and knowing it?

The classic answer is that knowledge is justified belief. You need grounds for confidence, not just confidence itself.

AI hallucination isn’t just “making things up.” It’s assertion without sufficient justification. The model has “beliefs” in the pragmatic sense that it produces some claims as if they were true. It lacks reliable awareness of whether those claims are justified. It can’t consistently distinguish “I have strong grounds for this” from “I’m pattern-matching into a gap.”

Humans do this too. The mechanisms differ and the consequences scale differently, but the failure pattern is recognizable: confident assertion outrunning justification. AI makes that mismatch visible at industrial scale, which is why it feels like a crisis.

 

Wikipedia’s 25-year epistemology project

Wikipedia spent 25 years building explicit epistemic markers, and we mostly ignore them as training signals.

You’ve seen these:

On factual claims:

  • [citation needed] — what’s your justification?
  • [unreliable source?] — justification quality questioned
  • [verification needed] — not yet checked

On reasoning and framing:

  • [original research] — inference not externally supported
  • [disputed] — conclusion contested
  • [neutrality disputed] — framing or reasoning bias suspected

And in the talk pages:

Challenging facts:

  • “Source for this?”
  • “Reverted, unsourced claim”

Challenging reasoning:

  • “This doesn’t follow from the premises”
  • “Correlation isn’t causation here”
  • “Consensus not reached”

Messy, human, sometimes wrong, sometimes biased, but crucially, the justification layer is visible.

List of Wikipedia Inline Cleanup Tags

Wikipedia Inline Cleanup Tags (aka epistemological tags)

 

The scarcity ladder: outputs, revisions, and “why”

High-quality epistemic training signals are rare.

Most text is final-form output (books, articles, polished docs). The deliberation that preceded them, the doubts, the alternatives rejected, is edited out. Even when drafts exist, they’re rarely published with change tracking.

A smaller subset includes revision history (code repositories, issue trackers, versioned docs, Wikipedia article diffs). This is valuable, and plausibly part of why models are so strong at programming. Open source contains immense volumes of change-level data and corrections.

But the smallest and most valuable subset includes explicit epistemic contestation and reasoning about the why (arguments over sources, disputes over inference, adversarial checking, and structured tags that mark the status of claims).

Wikipedia is unusually rich here because it bundles all three: final outputs, full revision history, and an explicit deliberation layer where people challenge both facts and reasoning. It’s not perfect. Nothing human is. But in terms of open, longitudinal, structured epistemic metadata, it’s hard to beat.

 

Facts are easier. Reasoning chains are hard.

Epistemic status of facts is relatively tractable. “The population of the Edmonton metropolitan area is 1.2 million.” Verifiable. Checkable. You can tag it [citation needed] and resolve it with a source.

Epistemic status of reasoning chains is genuinely hard. “Because A, and A implies B, and B suggests C, therefore we should do X.” Each step might be individually defensible. The chain can still be broken. Uncertainty compounds. Which inference is the weak link?

This is where prompted chain-of-thought falls short. Yes, the model can produce reasoning steps. But it can’t reliably evaluate the epistemic status of each step. It can’t consistently say “Step 2 is solid, step 4 is where I’m extrapolating without strong grounds.”

Wikipedia talk pages capture exactly this kind of evaluation. That’s rare.

Reddit has opinions without justification requirements.

Stack Overflow is instructive for a different reason. The top-voted answer, the accepted answer, and the correct answer can be three entirely different things. And crucially, why something is the top answer is almost never explained. Was it first? Clearest? Actually correct? Voting captures popularity and utility, not epistemic status. The reasoning that evaluated the answers, if it happened, isn’t visible.

Books and articles omit the deliberation. Academic peer review has rigor, but it’s mostly private. Wikipedia is one of the rare outlets that made the justification layer visible.

 

What risk management actually needs

Risk management needs to know where uncertainty lives. Is it in the data? The inference? The scope of applicability? The chain connecting premises to conclusion?

What you want is not “hedging language.” You want evidence-gated outputs: justification thresholds that decide whether a claim is safe to assert at all.

Think intelligence tradecraft rather than conversational politeness.

  • “I can’t corroborate this with sufficient high-quality sources.”
  • “This conclusion depends on assumption X, which I cannot validate.”
  • “Here are competing hypotheses and what would discriminate between them.”

 

“Just add a layer” is not enough

You can bolt epistemic controls on after the fact: retrieval, validators, ensembles, policy checks.

That helps. It’s also structurally limited.

Once the model has emitted a confident-sounding answer, you’re in cleanup mode. You detect the error, locate it, decide what to do with it, rewrite, and explain. A sufficiently ambitious orchestration layer also starts to look like another AI system making judgments about evidence and inference. If that layer doesn’t have better epistemic grounding than the model it supervises, you’ve just moved the same problem up a level.

This is why intrinsic epistemic tracking matters. Not because external layers are useless, but because without native epistemic discipline, you are constantly compensating for a missing capability instead of building on it.

 

What you can do now

  • Ask for sources and justification, not just answers.
  • Enforce thresholds: “If you can’t find enough high-quality sources, say ‘I don’t know.’”
  • Separate facts from inferences.
  • Ask “What’s the weakest link in this reasoning?”
  • Run small experiments: vary prompts, request counterarguments, and see what breaks.

 

The open question

Can models learn to track justification status rather than generate it on demand? Or is reasoning-chain uncertainty fundamentally harder to learn than factual uncertainty?

I don’t know. But I think it’s worth finding out, especially as we move toward multi-step actions and AI agents that can autonomously delegate work to other agents.

If you’re working in AI, epistemology isn’t optional. It’s the foundation for understanding what your systems actually know and what they just believe. We built machines that reflect our epistemic failures back at us, and the only durable fix is better justification discipline, in the model and in ourselves.

 


Method: This article was developed through reasoning with multiple AI's across multiple exchanges, testing framings, challenging assumptions, and refining the thesis iteratively. The conversation captured the justification process, not just the conclusions. Exactly the kind of data we're arguing is undersupplied in current training sets.

Share this content on your favorite social network today!

Unlock Cloud Security Insights

Unlock Cloud Security Insights

Choose the CSA newsletters that match your interests:

Subscribe to our newsletter for the latest expert trends and updates