muni-intel
NYC government procurement intelligence — stitching data across public sources to build a unified vendor/entity view.
Project Roadmap
| # | Phase | Status | Details |
|---|---|---|---|
| 1 | Postgres + schema | Done | Neon DB, Drizzle ORM, pg_trgm indexes |
| 2 | Checkbook NYC ingestion | Done | XML API, contracts + spending, rate-limited pagination |
| 3 | MWBE Directory ingestion | Done | Socrata API, certified firm list with ethnicity/industry |
| 4 | Entity resolution v1 | Done | Normalization-key exact + trigram fuzzy matching |
| 5 | Full multi-agency Checkbook pull | Done | 1.108M contracts across 69 agencies (API baseline 5.67M was misleading — real dataset ~1.1M) |
| 6 | Web explorer frontend | Done | Next.js on Vercel, 5 explorer pages |
| 7 | City Record procurement notices | Done | Socrata API, solicitations + awards + intents |
| 8 | Resolution candidate review UI | Done | Web tab to approve/reject fuzzy-match candidates; approvals merge entities live |
| 9 | Idempotent upserts + 3x/day cron | Done | Natural-key unique indexes, upsert on conflict; cron-refresh runs at 06/14/22 UTC |
| 10 | NYC Campaign Finance ingestion | Done | NYCCFB Follow-the-Money CSV — donors to NYC candidates, joins the entity graph |
| 11 | Persistent review decisions | Done | entity_review_decisions table — approve/reject/revert survive re-resolve; replayed at end of each run |
| 12 | Four-way entity resolution | Done | Checkbook + MWBE + City Record + Campaign Finance (org-only donors) merged by normalization key; trigram fallback for Checkbook & City Record |
| 13 | Nightly Checkbook + resolve in cron | Done | 02:00 UTC full Checkbook pull; 06/14/22 UTC light refresh (MWBE + City Record + CFB) + resolve replay |
| 14 | Resolution — Checkbook↔City Record fuzzy + auto-approve | Done | Stage 2b extends trigram across the two NYC sources; unambiguous pairs ≥0.96 auto-approve and persist as decisions |
| 15 | PASSPort ingestion | Skipped | NYC PASSPort has no public API; ASP.NET WebForms + auth required for the useful data. Skipped for now — duplicates City Record on the unauthenticated side, and VENDEX is a better direct target. |
| 16 | LLM-assisted disambiguation (Stage 4) | Done | Claude Haiku 4.5 classifies uncertain fuzzy pairs → match/no_match/uncertain. 1,778 classified at $1.88 total; ~76% of pairs auto-resolved. |
| 17 | Additional data sources | Not Started | VENDEX responsibility filings, lobbyist disclosures, subcontractor data |
Data Sources
NYC Comptroller financial transparency — registered expense contracts and spending transactions via XML API.
NYC SBS certified M/WBE, EBE, and LBE firms — ground-truth firm list for entity resolution.
Official NYC procurement notices — solicitations, awards, and intents published in the City Record.
NYCCFB itemized contributions — donors to NYC candidates and committees, from the Follow-the-Money bulk CSV archive.
NYC Procurement and Sourcing Solutions Portal — solicitation details, vendor registrations, award history. Requires scraper.
Entity Resolution
Four-way resolution across Checkbook, MWBE, City Record, and Campaign Finance (org donors). Stage 1: normalization-key exact match. Stage 2: trigram fuzzy. Stage 3: replays persisted review decisions.
CFB individual donors (c_code=IND) are excluded from the entity graph — only org donors (LLC, Corp, PAC, Union, etc.) participate in matching.
Latest Ingestion Runs
| Source | Status | Fetched | Inserted | Started | Completed |
|---|---|---|---|---|---|
| checkbook contracts | completed | 640,000 | 640,000 | 18h ago | 16h ago |
| checkbook spending | completed | 15,386 | 15,386 | 61d ago | 61d ago |
| mwbe directory | completed | 11,556 | 11,556 | 6h ago | 6h ago |
| city record | completed | 103,892 | 103,892 | 6h ago | 6h ago |
| campaign finance | completed | 259,967 | 259,967 | 6h ago | 6h ago |
Top Agencies by Contract Value
| Agency | Contracts | Total Value |
|---|---|---|
| 1,688,719 | $56.1B | |
| 236 | $33.0B | |
| 141,195 | $23.8B | |
| 1,050 | $11.3B | |
| 3,152 | $9.3B | |
| 3,310 | $5.1B | |
| 24,604 | $2.3B | |
| 31,491 | $1.3B | |
| 21,762 | $1.2B | |
| 184 | $606.1M |