Data dictionary
Field-level reference for the licensed CSV snapshots. Column types, source citations, and update cadence per dataset. This page is public; the equivalent PDF is attached to every signed MDLA for compliance reference.
General conventions
- • All timestamps are UTC. Dates use YYYY-MM-DD.
- • Numeric fields avoid scientific notation. BTC is expressed to 8 decimals, USD to 2.
- • CSVs use comma delimiter, RFC 4180 quoting, UTF-8, CRLF line endings.
- • Header row is always present. Column order is stable across snapshots.
- • Null is represented as empty string (not literal "NULL").
- • Each delivery includes a
manifest.jsonwith row counts, SHA-256 checksums per file, and snapshot timestamp.
entities.csv
One row per tracked entity (company, government, ETF). Represents the current state as of the snapshot timestamp.
Cadence: Weekly (Mon 08:00 UTC) or daily for Data Feed tier
| Field | Type | Description |
|---|---|---|
| id | uuid | Stable CorpStacking identifier. Persistent across snapshots. |
| name | string | Official entity name as registered (SEC for US, equivalent for non-US). |
| ticker | string | null | Stock ticker where applicable. Null for governments and private entities. source: SEC EDGAR, Yahoo Finance |
| country | ISO-3166-alpha-2 | Country of primary listing or domicile. |
| entity_type | enum | One of: corporate, government, etf, dao, private. |
| holdings_btc | numeric | Current BTC holdings. Derived from parsed purchases; reconciled nightly against bitbo.io. source: SEC filings + cross-check note: 0 if entity has divested. |
| holdings_eth | numeric | Current ETH holdings (for multi-token tracked entities). source: SEC filings |
| total_cost_usd | numeric | Sum of parsed purchase USD amounts at time of trade. |
| avg_price_usd | numeric | Weighted average acquisition cost per unit, in USD. |
| last_purchase | date (YYYY-MM-DD) | Filing date of the most recent detected purchase. |
| confidence | numeric 0–1 | Confidence score from the daily cross-check job. 1.0 = exact match with reference source. source: data_confidence_log |
| first_detected | timestamptz | When this entity first appeared in our ingestion pipeline. |
| updated_at | timestamptz | Last time any field on this row was mutated. |
purchases.csv
One row per detected purchase. Append-only — purchases are never deleted, only corrected in place when the source filing is amended.
Cadence: Weekly rollup or real-time (API, Data Feed tier)
| Field | Type | Description |
|---|---|---|
| id | uuid | Purchase row identifier. |
| entity_id | uuid | FK into entities.csv.id. |
| token_id | string | Asset identifier: "bitcoin", "ethereum", etc. |
| amount | numeric | Units purchased (BTC, ETH, etc.). |
| usd_amount | numeric | USD value at time of trade, from filing if disclosed, else computed from price_per_unit * amount. |
| price_per_unit | numeric | Average price per unit in USD. |
| total_holdings_after | numeric | Entity holdings after this purchase, as reported in the filing. |
| purchase_date | date | Trade date reported in the filing. May precede filing_date. |
| filing_date | date | Date the disclosing filing was submitted to the SEC. |
| sec_accession | string | SEC accession number (e.g. 0001104659-26-000001). Joinable against edgar bulk data. |
| filing_url | url | Direct link to the source filing on sec.gov. |
| source | enum | Provenance of this row: sec_edgar, manual, etf_disclosure, press_release. |
| fear_greed_value | integer 0–100 | null | Crypto fear & greed index value at purchase_date, for behavioral context. source: alternative.me |
holdings_history.csv
Daily holdings roll-up. Joined to entities.csv via entity_id, allows longitudinal analysis without recomputing from purchases.
Cadence: Daily nightly snapshot
| Field | Type | Description |
|---|---|---|
| date | date | Snapshot date. One row per entity per token per day. |
| entity_id | uuid | FK to entities.csv.id. |
| token_id | string | Asset identifier. |
| amount | numeric | Holdings on this date. |
| total_cost_usd | numeric | Cumulative acquisition cost in USD. |
| avg_price_usd | numeric | Cost basis per unit at end-of-day. |
| spot_price_usd | numeric | Closing market price for the asset, for mark-to-market calculations. |
Known caveats
- Non-US entities: coverage is best-effort. Filings from non-US regulators (FSA, ASIC, JFSA) are manually triaged and can lag real-time.
- ETF holdings: sourced from daily prospectus disclosures where available. For smaller ETFs we fall back to weekly filings — treat same-day ETF numbers as provisional.
- Cost basis on spun-off entities: when a holding entity reorganizes, cost basis is carried forward at book value. Mark-to-market consumers should recompute from purchase rows if allocation clarity matters.
- Amended filings: when a company files a correction (e.g. 8-K/A), we update the original purchase row in place and bump
updated_at. Consumers relying on append-only semantics should key on(id, updated_at).
Accuracy methodology
A nightly cron job (data-confidence) compares our holdings table against a reference source (bitbo.io). For each entity we compute:
confidence = 1 - min(1, abs(our_holdings - reference_holdings) / reference_holdings)Entities with confidence < 0.995 are flagged in the weekly snapshot's manifest.jsonunder flagged_entities. Historical confidence scores are logged and available on request as an audit trail.