ifrs-17 actuarial-modelling insurance-operations production-architecture csm

How to Validate an IFRS 17 Accounting Engine: The Testing Framework

· 7 min read · View on LinkedIn

Accounting-engine validation is one of the best places on an IFRS 17 programme to learn the standard in depth. It forces you to sit simultaneously inside the standard itself, the engine’s interpretation of it, the insurer’s methodology choices, and the engine’s input requirements. The work that surfaces when these four things meet is the work that matters — and it is where most of the interesting bugs live.

This post is about the validation framework — what you are validating, how the test cycle is structured, how to debug differences efficiently, and what the audit trail has to contain. The supporting tools (shadow model, test case generator, comparison tool) have their own post to come; this one is about the methodology.

1. What you are validating

The IFRS 17 accounting engine — the measurement engine — is responsible for three classes of output. Validation has to cover all three.

  • Business events. Every cash flow movement in a reporting period that needs to be recorded so that the roll-forward disclosures reconcile from opening to closing. These are the movements behind the numbers, not the numbers themselves.
  • Journal entries. The debits and credits produced by the engine’s posting logic, which together form the IFRS 17 subledger. Journal entries are business events transformed by the engine’s posting configuration.
  • Disclosure tables. The standard’s required disclosures, produced from the subledger. These are the outputs an auditor looks at first, but they are the last thing in the dependency chain — if the business events and journals are wrong, the disclosures are wrong by construction.

The upstream actuarial model, which produces the projected cash flows that feed the measurement engine, is out of scope here. It has its own validation regime. The engine validation starts with those inputs treated as given and tests everything the engine does with them.

2. The replication approach

The framework rests on one architectural decision: every output the engine produces has to be reproducible outside the engine, by an independent calculation path, against known inputs. The independent path is usually a shadow model — a prototype implementation of the IFRS 17 measurement rules that the team controls end-to-end.

The shadow model does not need to be as fast, polished, or feature-complete as the engine. It needs to be transparent. Every number has to be traceable back to the inputs that drove it, ideally in a format a reviewer can interrogate directly. Excel live formulas remain the most widely accepted audit-trail format for exactly this reason.

The test is always the same: run inputs through both paths, compare outputs, explain every difference.

3. Automation is structural, not optional

Validation without automation does not survive real contract volumes or real test-sprint cadences. Manual execution introduces errors, runs slowly, does not scale across a team, and produces a fragile audit trail. An automated pipeline writes its own trail as a side effect; a manual process requires someone to remember to document what happened.

The rule I work to is simple: if a step in the validation cycle is being done by hand more than three times, it needs to be automated. The initial engineering cost pays back within the first test sprint.

4. The testing cycle

A validation test sprint runs through the following steps. In a mature automation setup, most of these are a single command.

  • Test case design and batching. Test cases are grouped into batches aimed at specific items — a batch for simple GMM measurement, a batch for CSM unlocking, a batch for onerous-contract loss component mechanics, a batch for PAA. Design starts simple and builds complexity only after the simple cases pass. The design has to be agile — test cases get tweaked as issues emerge mid-sprint, not reserved for the next release.
  • Test case input generation. A test case generator produces the input data for each test case in a shape aligned with how the insurer’s real data will eventually land in the engine. Generating test case data in the production shape means the shadow model can later take live data directly without rework.
  • Input conversion. The generator’s output is converted into the formats the engine and the shadow model each require. If both accept the same shape, this step disappears. If not, the converter itself is a tested component.
  • Data checks on both input sets. This step surfaces more than 70% of the differences I have seen across engagements. Test cases where the engine and shadow model receive subtly different inputs produce outputs that look like engine bugs but are input bugs. Data checks have to be built into the generator, the converter, the shadow model, and the engine’s ingestion — and they have to agree.
  • Execution on both paths. Both systems run the test case and produce business events, journal entries, and disclosures.
  • Result extraction. Outputs from both systems are pulled into a common format.
  • Comparison. A comparison tool imports both result sets and highlights every difference above a tolerance, mapped to the relevant business event, journal, or disclosure line.

Every one of those steps leaves structured logs. The structured logs are the raw material for the audit trail.

5. Debugging order

When the comparison shows a difference, it could have come from any of five places. In order of how often I see each cause, roughly:

  • Input errors on the shadow-model side. Something about how the shadow model interpreted the test case input differs from what was intended.
  • Input errors on the engine side. Same, but on the engine.
  • Comparison-tool errors. Mapping logic between the two result sets is wrong, or an import issue has misaligned rows.
  • Configuration differences between the two systems. Different methodology choices — OCI option on one side and off the other, different coverage unit patterns, different risk adjustment approach.
  • Real methodology differences. The engine and the shadow model genuinely disagree on how a calculation should be done, and one of them is right.

The efficient debugging order works through this list in the order above. Most differences are resolved before you get to real methodology disagreements, and the ones that are not are the interesting ones — those are where the team learns something about the standard or the engine.

The wasted-time failure mode is jumping to the bottom of the list first. A difference in the CSM roll-forward number does not usually come from a bug in the CSM mechanics. It usually comes from an input that was converted wrongly three steps upstream.

6. The audit trail

The audit trail has to answer one question: can a reviewer, a year from now, reproduce what was done here and reach the same conclusion? The content has to include test case inputs and results for both the engine and the shadow model (preserved as run), the version of every tool and configuration used, the comparison output with differences explained or flagged as open, and sprint documentation covering what was tested, what issues came up, what was remediated, what was deferred, and who signed off.

The practical test is whether a new reviewer, given only the trail and access to the tools, can rerun the sprint and get the same results. If they can, it is complete. If they cannot, the gap will show up when the auditor asks.

7. Non-functional testing

Functional testing — does the engine produce the right numbers — is necessary but not sufficient. A production engine also has to be validated for user access controls, workflow behaviour under concurrent use, performance at production data volumes, backup and restore, and data security at rest and in transit. These sit outside the shadow-model framework and are usually handled by the insurer’s IT and audit functions, but they have to happen before the engine is production-ready. A functionally-perfect engine with broken access controls is not go-live material.

Closing thought

IFRS 17 engine validation is a structured, automated, auditable exercise. It is not a checklist and it is not a one-off. The framework above — replication via shadow model, automated test cycle, disciplined debugging order, complete audit trail — is what carries a programme from first test sprint through parallel-run into business-as-usual reporting. The teams that treated validation as a structural programme rather than a late-stage check were the ones that reached go-live without drama.

Working on something similar?

I've delivered IFRS 17, AI advisory, and actuarial training across 15 jurisdictions. If this topic is relevant to your team, let's talk.

Book 30 Minutes