Fact Checking — Methodology

Fact-Checking Methodology

Analytical framework, source tiers, label definitions, and limitations of the verification process.

Critical Caveat

The conflict described in the assessed documents began on February 28, 2026. Many specific claims within the assessments are scenario elements generated by AI models — they describe events within an AI-constructed conflict simulation, not verified real-world occurrences.

This fact-checking effort focuses on whether the background facts, military capabilities, economic data, and analytical frameworks referenced in the assessments are grounded in reality. Scenario-specific events (casualties, specific incidents, market movements) are evaluated for plausibility against historical precedent but cannot be verified as factual.

Additionally, this fact-checking was itself performed by an AI system (Claude), which introduces its own potential for error, bias, and hallucination. Users should treat the verification results as an analytical aid, not as authoritative determinations of truth.

Analytical Approach

Claims were extracted from all three AI assessments through systematic review of each page across all assessment websites. The extraction process followed these steps:

  1. Claim Identification: Each factual assertion, data point, statistic, event description, and analytical projection was identified and extracted from the assessment text.
  2. Grouping: Similar claims across assessments were consolidated. When multiple assessments made the same or similar claims, they were evaluated together to assess cross-model consistency.
  3. Classification: Each claim was classified into one of three categories: Background Fact, Scenario Element, or Analytical Projection (see definitions below).
  4. Verification: Background facts were checked against institutional sources. Scenario elements were evaluated for plausibility against historical precedent. Analytical projections were assessed for methodological soundness.
  5. Labeling: Each claim received a verification label (Verified, Partially Verified, Unverified, or Disputed) based on the evidence found.

Claim Type Definitions

Understanding the distinction between claim types is essential for interpreting the verification results. Not all claims in a strategic assessment are the same kind of assertion.

Background Facts
Verifiable
Pre-existing data points about military capabilities, economic statistics, historical events, geographic facts, or institutional information that exist independently of the AI-generated scenario. Examples include: Iran's missile inventory estimates, Strait of Hormuz oil transit volumes, defense system costs, country GDP data, and historical event references. These claims can be directly verified against institutional sources (EIA, IEA, IISS, SIPRI, IMF, World Bank, IAEA, etc.).
Scenario Elements
Not Verifiable
Events, casualties, incidents, and developments that are part of the AI-generated conflict scenario. These describe things that happen within the AI's simulation of the conflict and have no corresponding real-world events to verify against. Examples include: specific casualty numbers, the Minab school incident, specific operation outcomes, and political responses to fictional events. These can be assessed for plausibility (do they match historical patterns?) but cannot be fact-checked in the traditional sense.
Analytical Projections
Partially Verifiable
Forecasts, probability estimates, and projected outcomes based on models, assumptions, and analytical frameworks. Examples include: oil price spike ranges, GDP impact scenarios, casualty projections, and strategic outcome probabilities. These can be evaluated by checking whether the underlying assumptions and models are sound, whether the projections fall within ranges produced by established forecasting institutions, and whether the methodology is consistent with standard analytical practice. The projection itself cannot be verified as correct (it describes the future), but the analytical basis can be assessed.

Source Tier System

Verification sources were organized into a four-tier system reflecting credibility, institutional backing, and data reliability. Higher-tier sources were given greater weight in verification assessments.

Tier Source Category Examples Weight
Tier 1 Official Government & Institutional Data EIA, IEA, IAEA, IMF, World Bank, UN agencies, US Congressional Budget Office, US Government Accountability Office, OPEC official data Highest
Tier 2 Major Defense & Think Tank Analysis IISS (International Institute for Strategic Studies), SIPRI (Stockholm International Peace Research Institute), RAND Corporation, Brookings Institution, Council on Foreign Relations (CFR), Center for Strategic and International Studies (CSIS), Carnegie Endowment High
Tier 3 Quality Journalism Reuters, Associated Press, BBC, The New York Times, The Washington Post, The Wall Street Journal, Financial Times, The Economist, Al Jazeera English Moderate
Tier 4 Specialist Defense & Industry Publications Jane's (Janes) Defence Weekly, Defense News, The War Zone (The Drive), Aviation Week, Breaking Defense, Naval Institute Proceedings, Flight Global Moderate (specialist context)
Note on source usage: In practice, verification often required cross-referencing across multiple tiers. For example, a military capability claim might be checked against IISS data (Tier 2), confirmed in Jane's reporting (Tier 4), and contextualized by CRS reports (Tier 1). No single source was treated as definitive for complex claims.

Verdict Label Definitions

Each claim received one of four verification labels. The criteria for each label are defined below.

VERIFIED
✓ Verified
The claim is supported by at least two independent, reputable sources (Tier 1–3). The data point, fact, or assertion is consistent with established institutional reporting. Minor variations in exact figures (e.g., "20%" vs. "21%") are acceptable if they fall within standard measurement or reporting margins. Claims rated VERIFIED represent background facts with strong evidentiary support. This does not mean the claim is absolutely certain — only that it is well-supported by available evidence.
PARTIALLY VERIFIED
~ Partially Verified
The claim contains elements that are supported by evidence alongside elements that cannot be confirmed. This typically applies to claims that combine a verifiable baseline fact with an AI-generated application or projection. For example: a real missile production rate applied to a fictional consumption scenario. Claims may also be PARTIALLY VERIFIED when the general direction or order of magnitude is supported but the specific number cited cannot be precisely confirmed. This is the most common label for analytical projections grounded in real data.
UNVERIFIED
? Unverified
The claim cannot be confirmed or denied based on available evidence. This most commonly applies to scenario elements — events, casualties, and incidents generated by the AI as part of its conflict simulation. UNVERIFIED does not mean the claim is false; it means there is no independent evidence to support or refute it. Many UNVERIFIED claims are plausible within their scenario context but are inherently beyond fact-checking because they describe fictional events. Political polling figures, specific casualty counts, and unique incident details typically receive this label.
DISPUTED
X Disputed
The claim contradicts available evidence, contains factual inaccuracies, or is inconsistent with data from multiple reputable sources. Claims may also be DISPUTED when different assessments make contradictory assertions about the same factual matter. This is the most serious finding and is applied sparingly. A DISPUTED label requires positive evidence of inaccuracy, not merely absence of supporting evidence (which would warrant UNVERIFIED). Examples include incorrect institutional data, misattributed quotes, or factual errors about well-documented events.

Cross-Assessment Consistency Analysis

An important secondary methodology used in this fact-checking effort is cross-assessment consistency analysis. When multiple independent AI models produce similar claims from the same prompt, this can indicate either:

Cross-model agreement was used as a supplementary signal, not as a primary verification criterion. Three models independently citing the same statistic increases our confidence that the statistic is well-established, but it does not constitute independent verification in the same way that multiple human sources would.

Consistency Findings

  • Highest consistency: Energy statistics (Hormuz volumes, oil import data), major weapons system specifications (Shahed costs, F-35I existence), and constitutional law facts (War Powers Act dynamics)
  • Moderate consistency: Escalation sequences, proxy activation patterns, diplomatic positioning assessments
  • Lowest consistency: Specific casualty figures, operation names, particular incident details, exact market price projections

Limitations of This Analysis

This fact-checking effort has significant limitations that users must understand before relying on its conclusions.

AI-Performed Fact-Checking
This fact-checking was performed by an AI system (Claude), which means it is subject to the same potential errors as the assessments it is evaluating. The AI may hallucinate source citations, incorrectly recall institutional data, or apply flawed analytical frameworks. While the verification process is designed to be rigorous, it does not carry the authority of human expert review or institutional peer review. Users should treat these results as analytical guidance, not definitive truth.
Knowledge Cutoff
AI models have knowledge cutoffs that may affect the accuracy of verification. Data about military inventories, economic statistics, and geopolitical conditions may have changed between the model's training data cutoff and the assessment date. The verification reflects the AI's understanding of the state of affairs as of its training data, which may not capture the most recent developments.
Classified Information
Many military claims involve data that is classified or otherwise unavailable through open sources. Military inventories, production rates, intelligence assessments, and operational capabilities are often closely guarded information. Our verification is limited to what can be assessed through open-source intelligence (OSINT) and publicly available institutional reporting. Claims about classified capabilities may be rated PARTIALLY VERIFIED or UNVERIFIED not because they are inaccurate, but because we cannot access the relevant classified data.
Scenario vs. Reality Boundary
The most fundamental challenge in this fact-checking effort is the boundary between scenario elements and verifiable facts. The AI assessments seamlessly blend real background data with fabricated scenario events, making it sometimes difficult to distinguish which elements are drawn from the AI's knowledge base and which are generated as part of the scenario. Our classification system (Background Fact / Scenario Element / Analytical Projection) attempts to address this, but the boundary is not always clear-cut.
Selection Bias
The claims selected for fact-checking represent a subset of the total claims made across the three assessments. Selection prioritized claims that are most amenable to verification, most consequential for the assessments' conclusions, or most illustrative of common patterns. This means the verification statistics may not be representative of all claims in the assessments. Some categories of claims (e.g., psychological assessments of leaders, probability estimates for future events) are inherently resistant to fact-checking and are underrepresented in this analysis.

Source Distribution Across Assessments

The three assessments varied in their density of verifiable claims and their reliance on scenario-generated content.

Assessment Claims Extracted Background Facts Scenario Elements Projections
Claude ~32 ~40% ~35% ~25%
Codex ~22 ~45% ~30% ~25%
Gemini ~20 ~40% ~30% ~30%
Observations: The Claude assessment generated the highest volume of extractable claims, reflecting its generally longer and more detailed content. Codex showed the highest proportion of background facts relative to scenario elements, suggesting a more conservative approach to generating fictional scenario details. Gemini had a balanced distribution across all three claim types.

Recommendations for Users

How to Use These Assessments

  • Trust verified background facts (Hormuz oil volumes, weapons costs, institutional data) as generally reliable data grounded in established sources.
  • Treat scenario elements with maximum skepticism. Casualty figures, specific incidents, and operational outcomes are entirely AI-generated and should not be cited as factual.
  • Use analytical projections as frameworks for thinking about possibilities, not as predictions. The range of outcomes is often reasonable even when specific numbers are speculative.
  • Cross-reference across assessments. Where all three models agree on a background fact, confidence is higher. Where they diverge on scenario details, this confirms the fabricated nature of those details.
  • Do not cite these assessments as authoritative intelligence. They are AI-generated analyses produced for research and educational purposes only.
  • Be aware of this fact-check's own limitations. As an AI-generated verification, it may contain its own errors.
← Timeline Verification Back to Overview →