Probabilistic AI, Deterministic Problem: Why AI Fails at Finance Math

Every Final-Year Number Was Wrong

SumProduct, an Excel modeling consultancy, tested Excel's new COPILOT function. They asked it to project five products over five years using given initial sales figures and annual growth rates. The function returned a complete table. Every column filled, every row consistent. It looked right.

Every number in the final year was wrong. Some errors minor, some more noticeable. A cell that should have read 22,497 returned 22,517. And the function shows only the output: no formula, no calculation steps, no way to trace how any number was derived. The audit trail doesn't just fail to flag the error. There is no audit trail.

SumProduct's assessment: the numbers almost look right. That is the worst part.

The Vendor Agrees

Microsoft's own documentation for the COPILOT function warns users to avoid it for numerical calculations and to use native Excel formulas instead. The documentation explicitly states: avoid AI-generated outputs for financial reporting, legal documents, or other high-stakes scenarios. When the vendor tells you not to use their product for your use case, that warrants attention.

The broader evidence confirms the pattern. OpenAI's own research established that even advanced training methods raise LLM accuracy for numerical tasks to 78%. A 22% failure rate on math problems, using the best available methods. CFA Institute testing found it gets worse: their RAG pipeline achieved only 55% accuracy on quantitative data versus 66% on qualitative information. The model does not even need to calculate in that test. It just needs to read a number from a document and report it accurately. It fails 45% of the time.

For a dashboard no one acts on, these rates might be acceptable. For a pricing formula, a margin calculation, or a financial projection, they are disqualifying.

The Deterministic Handover

This is not a quality problem that the next model release will solve. It is an architectural mismatch.

LLMs work by predicting the most likely next token in a sequence. For an LLM, "10" is no different from "blue." Both are tokens. Neither carries mathematical meaning. That mechanism is powerful for pattern recognition, document analysis, and synthesis across unstructured data. It is the wrong mechanism for arithmetic.

Growth projections are deterministic operations: given the same inputs and growth rates, the output must be identical every time. Excel solved this decades ago with formulas that are exact, auditable, and reproducible.

Process selection is the decision that prevents this class of error. Finance operations fall into distinct categories. Pattern recognition and synthesis tasks suit AI. Exact reproducible outcomes need rules-based automation. Strategic judgment stays with humans. The SumProduct errors exist because AI was applied to tasks in the wrong category.

What I call the Deterministic Handover is the architectural principle that enforces this distinction. In a well-designed system, it should be impossible for numerical outputs to bypass deterministic execution. When an AI system encounters a calculation, it does not attempt the arithmetic. It routes the request to a rules-based engine that computes the result with guaranteed precision. The AI's role is translation: converting a question into a structured query that a deterministic tool executes.

Why This Is a CFO Problem

The danger is not one wrong number. It is interdependence. In a financial model, each calculation feeds the next. Pricing feeds margin analysis, margin feeds inventory thresholds, inventory feeds working capital. The SumProduct test demonstrated exactly this: five products projected over five years, where each year's output depends on the previous year's result. The final year was the most wrong because it had the most layers of prediction stacked on top of each other.

The errors didn't cancel out. They varied unpredictably in size and direction across cells. And the function shows only the output: no formula, no calculation steps, no way to trace how any number was derived. An analyst reviewing the table sees clean rows and has no mechanism to detect the errors without rebuilding the entire model independently.

The AI-in-Excel trend is accelerating. Add-ins and built-in functions make it easier than ever to put probabilistic prediction inside a tool built for deterministic computation. Before adopting any AI-powered spreadsheet tool, apply process selection at the task level. For every cell, every output: is this pattern recognition or deterministic calculation? If it's the latter, the spreadsheet already has the right tool. It's called a formula.

Is Your Finance Function Ready for AI?

Take the free 5-minute assessment to benchmark your AI readiness across strategy, use case selection, and governance — and get a personalised action plan.

Take the AI Readiness Assessment

Probabilistic AI. Deterministic Problem.

Every Final-Year Number Was Wrong

The Vendor Agrees

The Deterministic Handover

Why This Is a CFO Problem

Is Your Finance Function Ready for AI?