CIA World Factbook 1990 – 2025
The CIA World Factbook was published continuously from 1962 until its discontinuation on February 4, 2026. This archive preserves every publicly available edition from 1990 to 2025 in a structured, queryable SQL Server database.
Data was collected from three sources: Project Gutenberg and CIA original text files (1990s plain text editions), the Wayback Machine (2000-2020 HTML archives), and the factbook/cache.factbook.json GitHub repository (2021-2025 JSON snapshots).
| Years | Source | Format |
|---|---|---|
| 1990-1999 | Project Gutenberg + CIA original (1996) | Plain text (4 format variants) |
| 2000-2020 | Wayback Machine | HTML zip archives (5 parser generations) |
| 2021-2025 | GitHub (factbook/cache.factbook.json) | JSON with year-end git snapshots |
| Table | Rows | Description |
|---|---|---|
| MasterCountries | 281 | Canonical entities with EntityType, ISO codes |
| Countries | 9,536 | Per-year country records |
| CountryCategories | 83,682 | Section headings (Geography, Economy, etc.) |
| CountryFields | 1,071,603 | Individual data fields (~238 MB) |
| FieldNameMappings | 1,132 | Maps 1,132 variants to 416 canonical names |








































The data is provided as SQL INSERT scripts compatible with SQL Server 2017+. The CountryFields table is split into 36 gzipped files by year (~59 MB total compressed).
To restore the archive locally:
2. Run schema/create_tables.sql
3. Import data/master_countries.sql, then countries.sql, categories.sql, field_name_mappings.sql
4. Decompress and import each data/fields/country_fields_YYYY.sql.gz
The raw CIA World Factbook changed format at least 10 times between 1990 and 2025. The ETL pipeline downloads original source data, parses every format variant, and loads structured results into SQL Server via pyodbc.
| Script | Years | What It Does |
|---|---|---|
| build_archive.py | 2000–2020 | Downloads HTML zips from the Wayback Machine, detects which of 5 HTML layouts each year uses, and parses fields |
| load_gutenberg_years.py | 1990–2001 | Parses plain-text Gutenberg editions with 4 distinct format variants (tagged, asterisk, at-sign, colon); 1996 supplemented by CIA original text |
| reload_json_years.py | 2021–2025 | Checks out year-end git commits from factbook/cache.factbook.json and loads structured JSON |
| build_field_mappings.py | All | Maps 1,132 raw field name variants to 416 canonical names using a 7-rule system |
| classify_entities.py | All | Auto-classifies 281 entities into 9 types based on Dependency Status and Government Type fields |
| validate_integrity.py | All | Read-only validation with 9 checks: field count benchmarks, ground truth, year-over-year consistency |
| parse_field_values.py | All | Decomposes 1,071,603 text blobs into 1,775,588 typed sub-values using 55 field-specific parsers + SourceFragment provenance (land/water, male/female, sex ratios, literacy, revenues/expenditures, age brackets, CO2 emissions, etc.) |
| validate_field_values.py | All | Validates parsed FieldValues: coverage (98.9%), numeric extraction rate (61.5%), spot checks against known ground truth |
The 1,071,603 fields in CountryFields store raw text blobs with pipe (|) delimiters separating sub-fields. The structured parsing pipeline decomposes these into 1,775,588 individually queryable, typed sub-values across 2,599 distinct sub-fields using 55 dedicated parsers. Each row includes a SourceFragment showing the exact text slice that produced the value. This enables SQL queries that were previously impossible without per-query regex.
Download: factbook.db (~662 MB) from Release v3.5 — single self-contained database
Live dashboard: worldfactbookarchive.org/analysis/structured-data — interactive charts with SQL and source data tabs
The CIA never maintained a stable schema. Every few years the HTML layout changed completely, field names were renamed without notice, and entire categories were restructured.
Four different formatting conventions across the decade. 1990–1993 used indented fields. 1994 introduced tagged markers. 1996 switched to bare section headers. 1999 changed the delimiter scheme again. Each variant required its own regex-based parser.
2000–2020 (HTML)
The CIA redesigned the Factbook website at least 5 times. The 2000 edition used inline <b> formatting. By 2004 it switched to table layouts. 2008 introduced CollapsiblePanel JavaScript widgets. 2014 changed to expand/collapse sections. 2017 moved to field-anchor div structures. A parser that worked on 2006 data would produce garbage on 2010 data.
Field Name Drift
The CIA renamed fields silently over the decades. “GDP - real growth rate” became “Real GDP growth rate.” “Telephones” split into “Telephones - fixed lines” and “Telephones - mobile cellular.” The field mapping script tracks all 1,132 variants through 7 rule layers.
| Year | Source | Countries | Fields |
|---|---|---|---|
| 1990 | TEXT | 249 | 15,750 |
| 1991 | TEXT | 247 | 14,903 |
| 1992 | TEXT | 264 | 17,372 |
| 1993 | TEXT | 266 | 18,509 |
| 1994 | TEXT | 266 | 18,761 |
| 1995 | TEXT | 266 | 19,599 |
| 1996 | TEXT | 266 | 20,764 |
| 1997 | TEXT | 266 | 23,405 |
| 1998 | TEXT | 266 | 23,524 |
| 1999 | TEXT | 266 | 25,178 |
| 2000 | HTML | 267 | 25,724 |
| 2001 | TEXT | 265 | 27,281 |
| 2002 | HTML | 268 | 27,430 |
| 2003 | HTML | 268 | 28,676 |
| 2004 | HTML | 271 | 28,958 |
| 2005 | HTML | 271 | 28,728 |
| 2006 | HTML | 262 | 28,950 |
| 2007 | HTML | 259 | 29,096 |
| 2008 | HTML | 261 | 30,753 |
| 2009 | HTML | 260 | 30,818 |
| 2010 | HTML | 262 | 30,805 |
| 2011 | HTML | 262 | 33,634 |
| 2012 | HTML | 262 | 35,183 |
| 2013 | HTML | 267 | 36,729 |
| 2014 | HTML | 267 | 36,679 |
| 2015 | HTML | 266 | 36,868 |
| 2016 | HTML | 268 | 36,804 |
| 2017 | HTML | 268 | 37,046 |
| 2018 | HTML | 268 | 37,285 |
| 2019 | HTML | 268 | 37,394 |
| 2020 | HTML | 268 | 36,687 |
| 2021 | JSON | 260 | 39,714 |
| 2022 | JSON | 260 | 37,344 |
| 2023 | JSON | 260 | 37,558 |
| 2024 | JSON | 260 | 34,838 |
| 2025 | JSON | 260 | 32,594 |
| Type | Count | Description |
|---|---|---|
| sovereign | 192 | Independent states |
| territory | 65 | Dependencies, overseas territories |
| misc | 7 | Oceans, World, European Union |
| disputed | 6 | Kosovo, Gaza Strip, West Bank, etc. |
| crown dep. | 3 | Jersey, Guernsey, Isle of Man |
| freely assoc. | 3 | Marshall Islands, Micronesia, Palau |
| special admin | 2 | Hong Kong, Macau |
| dissolved | 2 | Netherlands Antilles, Serbia and Montenegro |
| antarctic | 1 | Antarctica |
The CIA renamed many fields over the 36-year span. The FieldNameMappings table maps 1,132 raw field name variants to 416 canonical names:
| Mapping Type | Count | Description |
|---|---|---|
| Identity | 184 | Modern field names (unchanged) |
| Rename | 159 | CIA renamed the field (e.g. “GDP - real growth rate” → “Real GDP growth rate”) |
| Dash format | 64 | Formatting differences (single vs double dashes) |
| Consolidation | 48 | Sub-fields merged into parents (e.g. Oil → Petroleum) |
| Country-specific | 354 | Regional sub-entries, government body names |
| Noise | 281 | Parser artifacts, fragments (flagged IsNoise=1) |
World Leaders Database — Comprehensive leadership data for 200+ countries with governance analysis, power concentration metrics, and security apparatus tracking.
CIA Studies in Intelligence — Full-text searchable archive of declassified CIA journal articles with publication analytics and topic trends.
Geopolitical Atlas — Territorial disputes, infrastructure overlays, and OSINT missile facility data on interactive maps.
Demographics — Population pyramids with country comparison, animation, and historical overlay.
Scatter Plot Analysis — Multi-indicator scatter with regression lines, outlier detection, and region filtering.
• Dashboard builder — Custom analytical dashboards with drag-and-drop chart layout
• Analytics redesign — Traffic, audience, and security monitoring with the Dark Intelligence theme
All data originates from the CIA World Factbook, a public-domain U.S. Government publication. This project is not affiliated with the CIA or U.S. Government.
Country Dictionary
281 entities
| Entity Name | FIPS | ISO-2 | Type | Coverage | Fields |
|---|