Real Estate Market Intelligence Platform: MLS Data Pipeline

MLS data pipeline. Ingests manual CSV exports, normalizes inconsistent field formats, and stores canonical records with full snapshot history for point-in-time queries.

Data handling and MLS compliance

This project processes MLS listing data exported from MLSListings, the multiple listing service that serves SAMCAR (San Mateo County Association of REALTORS) and surrounding Northern California associations. MLS listing data is provided under a limited, personal-use license to subscribed real estate professionals. Subscribers hold a revocable license to use the data for purposes permitted under the rules of their MLS; they do not own the underlying data.

For this reason, this project is intentionally designed to run locally only:

All ingestion, normalization, and storage happens on a single developer machine
No cloud deployments, no managed databases, no external sync of MLS exports
Sample CSVs and ingestion logs are excluded from version control via .gitignore
This project is not an IDX, VOW, or syndication implementation, and it does not display, distribute, or share MLS data with any third party

Any extension of this project to cloud infrastructure would require explicit authorization from MLSListings under their vendor or licensee program. The design choices documented here reflect the conservative posture of a personal-use subscriber, not legal interpretation.

Source documents:

Stack

Ruby on Rails 8 — pipeline framework
PostgreSQL — primary data store
Rake tasks — pipeline execution
RSpec — test framework (bundle exec rspec)

Pipeline

CSV → Ingest → Validate → Normalize → Store → Snapshot → Aggregate

Stage	Description
Ingest	Raw CSV row preserved as-is in `raw_listings`
Validate	Feed profile checked against CSV headers — loud failure on drift
Normalize	Raw fields mapped to canonical schema via `FeedProfile`
Store	Canonical record written to `listings`
Snapshot	Append-only record written to `listing_snapshots`
Aggregate	Query objects — read-only market signals

Status

Phase	Description	Status
1 — Rails Scaffold + Schema	Rails app + all five migrations	✅ Complete
2 — Feed Profile	`FeedProfile`, `FeedColumn`, validator	✅ Complete
3 — Ingest Layer	`RawListing` model, `Ingester` service, Rake task	✅ Complete
4 — Normalization Layer	`Normalizer`, `ListingNormalizer`, snapshots	✅ Complete
5 — Aggregate Layer	`MarketSummaryQuery`, `PriceTrendQuery`	✅ Complete
6 — Pipeline Wiring	`caster:run`, `caster:validate`, full pipeline wiring	✅ Complete

Setup

bundle install
rails db:create db:migrate
rails db:seed

Usage

# Run full pipeline
rails caster:run[path/to/export.csv]

# Validate only (no ingestion)
rails caster:validate[path/to/export.csv]

# Query market data (Rails console)
MarketSummaryQuery.new(zip_code: "94131").call
MarketSummaryQuery.new(area_name: "Sunnyside", status: "S").call
PriceTrendQuery.new(zip_code: "94131").call

Data

Manual CSV exports from MLSListings / Matrix. Place exports in data/. Raw rows are preserved in raw_listings — never overwritten.

Roadmap, refactor + parity backlog

All open work is tracked in issues:

roadmap — future features (seed automation, comps/absorption/inventory queries, multi-feed support, API, scheduled ingestion)
refactor — Sandi/Olsen audit secondary findings (S1–S6)
parity — convention parity with SABER (D1–D6; D5 = S5)
test — test coverage gaps

Cross-project view: Personal Workboard.

The refactor wave's four primary items shipped: ListingNormalizer driven off feed_columns, ListingScope/Cents extraction, snapshot creation extracted from Normalizer, and the send-dispatch registry replacing case-on-type. Open items are smaller cleanups.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.claude		.claude
.github		.github
.kamal		.kamal
app		app
bin		bin
config		config
db		db
lib/tasks		lib/tasks
log		log
public		public
script		script
spec		spec
vendor		vendor
.dockerignore		.dockerignore
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
Dockerfile		Dockerfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real Estate Market Intelligence Platform: MLS Data Pipeline

Data handling and MLS compliance

Stack

Pipeline

Status

Setup

Usage

Data

Roadmap, refactor + parity backlog

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real Estate Market Intelligence Platform: MLS Data Pipeline

Data handling and MLS compliance

Stack

Pipeline

Status

Setup

Usage

Data

Roadmap, refactor + parity backlog

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages