Methodology
How IMS turns raw public signals into corroborated, source-cited, tamper-evident incident records — written so a procurement reviewer or an auditor can follow the trail end-to-end.
1. Ingestion
Each integrated source family (see the live registry) feeds the pipeline at its own cadence — newswires push events as they happen; situation reports arrive on publication; geophysical feeds poll on intervals matched to their refresh rate. Every record carries:
- Source slug (canonical identifier — never the marketing brand name)
- Original URL
- Original publish timestamp (UTC, normalised)
- Raw payload (kept for audit; never shown unless an admin opens the audit log)
- Ingest timestamp (when our pipeline saw it)
2. Deduplication
The same event reported by four sources surfaces as one incident — not four. The clustering decision is made on three independent signals:
- Entity overlap. Named entities (countries, organisations, people, places) extracted from each record. Records that share ≥ 2 entities are clustering candidates.
- Geographic proximity. Resolved coordinates within a configurable radius (default 25 km for hyper-local events, 100 km for regional).
- Temporal window. Records within a sliding 6-hour window for breaking events; 24 hours for slower-moving humanitarian incidents.
Text-similarity is used as a tie-breaker only — never as the sole basis. Pure-text clustering produces too many false positives across multilingual sources.
3. Cross-corroboration
A signal is promoted to VERIFIED only when:
- ≥ 2 sources of distinct type confirm the core facts. Two newswires count as 1 type. A newswire + a sitrep + a local channel = 3 types.
- Source-type taxonomy: newswire · humanitarian-sitrep · NGO-statement · government-release · academic · curated-local-channel · geophysical-instrument.
- Single-source items remain visible in the console but are explicitly flagged. They are never shown on the public marketing preview feed.
4. Source credibility scoring
Every active source carries a credibility score in [0.0, 1.0]. The score is the moving median of three sub-scores, weighted equally:
- Editorial reliability. Independent assessments from Reporters Without Borders, ACOS Alliance, and similar press-freedom indices, normalised to 0–1.
- Track record. The source's own historical retraction rate inside our pipeline. A source that retracts an incident is penalised; a source whose claims are independently confirmed by other sources is rewarded.
- Latency consistency. A penalty for sources that lag > 90 minutes behind the median first-reporter on shared events. Latency does not affect truth, but it does affect operational utility.
The current population-wide average across our 70+ active sources is published live on the status page (today: —). We do not show the per-source breakdown publicly to avoid creating an editorial pressure surface; institutional partners get the full table on request.
5. Geo-resolution
Locations are resolved at ingest time, never inferred from the headline alone. The resolver chain:
- Explicit coordinates in the source payload (USGS earthquake feeds, OCHA structured sitreps).
- Named places extracted, then geocoded against an internal gazetteer (OSM-derived).
- Country-level fallback only.
Each incident card surfaces the highest-precision available — and explicitly marks low-precision resolutions with the radius indicator.
6. Severity rubric
Severity is a 1–5 scale, scored on three axes (impact, scope, escalation potential), summed and normalised:
- 1 — Watch. Anomaly worth seeing. Not actionable in itself.
- 2 — Note. Operational awareness. Affects routine planning.
- 3 — Concern. Calls for a brief. Affects active operations / staff safety.
- 4 — Material. Calls for escalation. Affects mission posture.
- 5 — Critical. Calls for immediate action. Affects life safety.
The rubric is published at the bottom of every incident card in the console — never a black-box number.
7. PDF chain-of-custody
Every generated brief carries:
- Classification banner (default: OFFICIAL USE ONLY · IMS BRIEF; customer-overridable)
- Reporting period, generation timestamp, generating user
- Per-incident source citations with timestamps
- SHA-256 hash computed over the canonical content payload (rows + metadata, JSON-canonical, sorted keys). Visible in the footer.
The hash is verifiable offline by any reader who has the original JSON payload. We publish the algorithm so independent verification needs no proprietary tool. A reader who finds a brief whose hash does not match its content has either a corrupted file or a forgery — both worth investigating.
8. Forwarding & audit
Every forward action (analyst sends a brief to a superior) writes an audit-log row:
- Brief ID + SHA-256 hash
- From / to / cc
- Timestamp
- Optional context note from sender
Subsequent opens of the brief (via the secure link) append further rows: who opened, from where (geo-IP, never browser fingerprint), when. The audit log is queryable by any user with read-access to the originating brief.
9. What we deliberately do NOT do
- No "AI confidence score" on incident cards. The corroboration count and the source list are the trust signal. A single unexplainable percentage is not.
- No analyst-voice text generated by the system without an explicit human-in-the-loop step. Brief executive-summary text is templated and fact-stitched, not free-form generated.
- No facial recognition. No PII scraping. No data sourced from non-disclosed local channels. If we add a regional channel, it will be named in the source registry.
- No predictive alerting. We surface what is happening, not what we predict will happen.
10. Methodology versioning
This page describes methodology v1, in production since 2026-04. Material changes will increment the version and be announced in the changelog. Briefs generated under earlier methodology versions remain valid; their footer encodes the methodology version active at generation time.
Questions or technical due-diligence on the methodology: trust@quintarthai.com.