A local reconciliation service for OCLC FAST (Faceted Application of Subject Terminology), compatible with OpenRefine and the W3C Reconciliation Service API v0.2.
Match subject headings against 2+ million FAST authority records with full-text search.
- Full-text search with FTS5 for fast, fuzzy matching
- Type filtering by FAST facet (Topical, Personal, Geographic, etc.)
- OpenRefine compatible via datasette-reconcile plugin
- Self-contained Python or Docker virtual environment
- Offline operation - works without internet after initial setup
- LCSH cross-references preserved from source data
docker compose up -d # First run downloads data and builds DB (~30-60 min)
docker compose logs -f # Watch progressmake build # Downloads data, creates venv, builds DB
make serve # Start serverOCLC uses Cloudflare protection which may block automated downloads. If make build fails with a download error:
-
Download manually from your browser:
-
Save to:
data/FASTAll.marcxml.zipmkdir -p data mv ~/Downloads/FASTAll.marcxml.zip data/ -
Run build again (native or Docker - both use the same
./data/directory):make build # Native # or docker compose up -d # Docker
The service will be available at:
http://127.0.0.1:8001/fast/FAST/-/reconcile
The XSLT transformation of large files (especially FASTPersonal.marcxml at 1.7GB) requires significant memory. Docker containers may have insufficient memory allocation, causing truncated output files and build failures.
Recommended workflow for reliability:
-
Build the database natively (macOS/Linux can dynamically allocate memory):
make build # Full pipeline with native Saxon -
Serve via Docker for network deployment:
docker compose up -d # Uses the same ./data/ directory
Both native and Docker builds share the ./data/ directory, so you can:
- Build once natively where memory is plentiful
- Serve via Docker for consistent deployment across machines
- Skip the lengthy transformation on subsequent Docker deployments
If you must build entirely in Docker, increase Docker Desktop memory allocation:
- Docker Desktop → Settings → Resources → Memory → 12-16GB
- Column dropdown → Reconcile → Start reconciling...
- Click Add Standard Service...
- Enter:
http://127.0.0.1:8001/fast/FAST/-/reconcile - Optionally filter by type (e.g.,
Topicalfor subjects,Personalfor people)
| Docker | Native | Description |
|---|---|---|
docker compose up -d |
make build && make serve |
Build and run |
docker compose down |
Ctrl+C | Stop |
docker compose run --rm fast make status |
make status |
Show stats |
docker compose run --rm fast make update |
make update |
Re-download data |
docker compose down -v |
make clean-all |
Remove everything |
Filter reconciliation by FAST facet type:
| Type | Description | Example |
|---|---|---|
Topical |
Subjects, concepts, activities | "Climate change", "Jazz music" |
Personal |
People, authors, historical figures | "Shakespeare, William", "Einstein, Albert" |
Corporate |
Organizations, companies | "United Nations", "Apple Inc." |
Geographic |
Places, regions, countries | "London (England)", "Amazon River" |
Event |
Wars, conferences, events | "World War, 1939-1945" |
Chronological |
Time periods, eras | "Twentieth century" |
Title |
Works, publications | "Bible", "Hamlet" |
FormGenre |
Document types, formats | "Dictionaries", "Science fiction" |
Meeting |
Conferences, symposia | "Olympic Games" |
For targeted reconciliation, use facet-specific endpoints:
http://127.0.0.1:8001/fast/FASTTopical/-/reconcile
http://127.0.0.1:8001/fast/FASTPersonal/-/reconcile
http://127.0.0.1:8001/fast/FASTGeographic/-/reconcile
http://127.0.0.1:8001/fast/FASTCorporate/-/reconcile
http://127.0.0.1:8001/fast/FASTEvent/-/reconcile
http://127.0.0.1:8001/fast/FASTChronological/-/reconcile
http://127.0.0.1:8001/fast/FASTTitle/-/reconcile
http://127.0.0.1:8001/fast/FASTFormGenre/-/reconcile
http://127.0.0.1:8001/fast/FASTMeeting/-/reconcile
The build process transforms OCLC's MARC XML authority files:
OCLC FAST Data (MARC XML)
│
▼ fast2skos.xsl (Saxon)
SKOS/RDF
│
▼ skos2csv-reconcile.xsl (Saxon)
CSV (9 facet files)
│
▼ sqlite-utils
SQLite + FTS5 Index
│
▼ datasette-reconcile
W3C Reconciliation API
Docker: Docker Desktop (Windows, macOS, Linux)
Native:
- Python 3.10+
- Java JDK (for Saxon XSLT processor)
- Saxon HE (
brew install saxonon macOS) - curl, unzip, make
Disk space: ~8GB (500MB download → 2GB SKOS → 500MB CSV → 1.5GB database)
Build time: 30-60 minutes (Saxon transformation is CPU-intensive)
For development or debugging, run pipeline stages individually:
make download # Get FASTAll.marcxml.zip from OCLC
make extract # Unzip to data/marcxml/
make skos # Convert MARC XML → SKOS (slow)
make csv # Convert SKOS → CSV
make build # Build SQLite database from CSVFAST data is provided by OCLC under the ODC-BY license.
Credit: OCLC Research - https://www.oclc.org/research/areas/data-science/fast.html
- OCLC FAST - Faceted Application of Subject Terminology
- searchFAST - OCLC's online FAST search
- datasette - Tool for exploring and publishing data
- datasette-reconcile - Reconciliation API plugin
- OpenRefine - Data cleaning tool
- W3C Reconciliation API - Specification
See xslt/docs/ for guidelines on the MARC → SKOS transformation, including:
- ATHENA D4.2 Guidelines for mapping into SKOS
- SKOS mapping analysis notes