Data dictionary

Field-level reference for the licensed CSV snapshots. Column types, source citations, and update cadence per dataset. This page is public; the equivalent PDF is attached to every signed MDLA for compliance reference.

General conventions

  • • All timestamps are UTC. Dates use YYYY-MM-DD.
  • • Numeric fields avoid scientific notation. BTC is expressed to 8 decimals, USD to 2.
  • • CSVs use comma delimiter, RFC 4180 quoting, UTF-8, CRLF line endings.
  • • Header row is always present. Column order is stable across snapshots.
  • • Null is represented as empty string (not literal "NULL").
  • • Each delivery includes a manifest.json with row counts, SHA-256 checksums per file, and snapshot timestamp.

entities.csv

One row per tracked entity (company, government, ETF). Represents the current state as of the snapshot timestamp.

Cadence: Weekly (Mon 08:00 UTC) or daily for Data Feed tier

FieldTypeDescription
iduuidStable CorpStacking identifier. Persistent across snapshots.
namestringOfficial entity name as registered (SEC for US, equivalent for non-US).
tickerstring | nullStock ticker where applicable. Null for governments and private entities.
source: SEC EDGAR, Yahoo Finance
countryISO-3166-alpha-2Country of primary listing or domicile.
entity_typeenumOne of: corporate, government, etf, dao, private.
holdings_btcnumericCurrent BTC holdings. Derived from parsed purchases; reconciled nightly against bitbo.io.
source: SEC filings + cross-check
note: 0 if entity has divested.
holdings_ethnumericCurrent ETH holdings (for multi-token tracked entities).
source: SEC filings
total_cost_usdnumericSum of parsed purchase USD amounts at time of trade.
avg_price_usdnumericWeighted average acquisition cost per unit, in USD.
last_purchasedate (YYYY-MM-DD)Filing date of the most recent detected purchase.
confidencenumeric 0–1Confidence score from the daily cross-check job. 1.0 = exact match with reference source.
source: data_confidence_log
first_detectedtimestamptzWhen this entity first appeared in our ingestion pipeline.
updated_attimestamptzLast time any field on this row was mutated.

purchases.csv

One row per detected purchase. Append-only — purchases are never deleted, only corrected in place when the source filing is amended.

Cadence: Weekly rollup or real-time (API, Data Feed tier)

FieldTypeDescription
iduuidPurchase row identifier.
entity_iduuidFK into entities.csv.id.
token_idstringAsset identifier: "bitcoin", "ethereum", etc.
amountnumericUnits purchased (BTC, ETH, etc.).
usd_amountnumericUSD value at time of trade, from filing if disclosed, else computed from price_per_unit * amount.
price_per_unitnumericAverage price per unit in USD.
total_holdings_afternumericEntity holdings after this purchase, as reported in the filing.
purchase_datedateTrade date reported in the filing. May precede filing_date.
filing_datedateDate the disclosing filing was submitted to the SEC.
sec_accessionstringSEC accession number (e.g. 0001104659-26-000001). Joinable against edgar bulk data.
filing_urlurlDirect link to the source filing on sec.gov.
sourceenumProvenance of this row: sec_edgar, manual, etf_disclosure, press_release.
fear_greed_valueinteger 0–100 | nullCrypto fear & greed index value at purchase_date, for behavioral context.
source: alternative.me

holdings_history.csv

Daily holdings roll-up. Joined to entities.csv via entity_id, allows longitudinal analysis without recomputing from purchases.

Cadence: Daily nightly snapshot

FieldTypeDescription
datedateSnapshot date. One row per entity per token per day.
entity_iduuidFK to entities.csv.id.
token_idstringAsset identifier.
amountnumericHoldings on this date.
total_cost_usdnumericCumulative acquisition cost in USD.
avg_price_usdnumericCost basis per unit at end-of-day.
spot_price_usdnumericClosing market price for the asset, for mark-to-market calculations.

Known caveats

  • Non-US entities: coverage is best-effort. Filings from non-US regulators (FSA, ASIC, JFSA) are manually triaged and can lag real-time.
  • ETF holdings: sourced from daily prospectus disclosures where available. For smaller ETFs we fall back to weekly filings — treat same-day ETF numbers as provisional.
  • Cost basis on spun-off entities: when a holding entity reorganizes, cost basis is carried forward at book value. Mark-to-market consumers should recompute from purchase rows if allocation clarity matters.
  • Amended filings: when a company files a correction (e.g. 8-K/A), we update the original purchase row in place and bump updated_at. Consumers relying on append-only semantics should key on (id, updated_at).

Accuracy methodology

A nightly cron job (data-confidence) compares our holdings table against a reference source (bitbo.io). For each entity we compute:

confidence = 1 - min(1, abs(our_holdings - reference_holdings) / reference_holdings)

Entities with confidence < 0.995 are flagged in the weekly snapshot's manifest.jsonunder flagged_entities. Historical confidence scores are logged and available on request as an audit trail.

Data dictionary — CorpStacking licensing