Alex Henry Christensen - Economist & Data Analyst

Market Intelligence Platform

Every data system has a label problem. Official categories describe what a company was, not what it's exposed to right now. A gaming company pivots its treasury to Ethereum - it still says "Electronic Gaming" in every database. A photonics maker, a memory chip supplier, a data-centre cooling firm, and a power equipment company all start moving together - placed in different sectors by every standard classification scheme, driven by the same underlying bet on AI infrastructure. Industry labels like GICS and SIC are slow to change by design. But risk doesn't wait. When a company's actual exposure diverges from its label, standard tools miss it - and so does anyone relying on them. The platform is built around one question: what is this company actually, right now? 5,200 US stocks reclassified by real risk exposure - extracted from regulatory filings by AI - with unusual price moves logged daily and processed to surface the narratives that explain why stocks move together, and when that changes.

Data Sources

Polygon.io market data · 5,200 tickers daily

SEC EDGAR 10-K filings · 4,800+ companies

Claude API classification + narrative analysis

→

Pipeline

Market Data Ingestion fetch · split-adjust · parquet storage

Risk Classification 15 sectors · 125 clusters · AI on filings

Anomaly Detection rolling z-scores · full universe daily

Narrative Engine event log · AI-powered clustering

→

Output

Anomaly Screener flags unusual moves across all stocks

Research Dashboard serverless · no runtime server

Narrative Reports AI-clustered cross-market themes

Platform in action - click a use case to explore

Use Case 01 GameSquare: classified as an esports company - but its Ethereum treasury means it moves with crypto, not gaming. The platform surfaces both exposures and flags the uncertainty.

GAME in platform screener showing classification panel with ETH price driver

Raw classification database showing GAME confidence 0.72 and price drivers

Step 01 / 02

The platform view

GameSquare is labelled an esports gaming company - reasonable on the surface. But the platform read its regulatory filing and found something the label misses: the company holds Ethereum as a treasury asset, meaning its stock moves with crypto prices, not just gaming ad spend. No standard classification captures both at once.

Ticker: GAME
Sector: media_and_entertainment
Industry: esports gaming
Key driver: ETH price

Step 02 / 02

What the system extracted

The raw classification data shows confidence at 0.72 - the system flagging that this company doesn't fit cleanly into one category. The price drivers were extracted directly from Item 1A of the 10-K filing. The rationale explains why: dual exposure to both gaming fundamentals and ETH price complicates any single label.

Confidence: 0.72
Source: 10-K Item 1A
Signal: dual exposure
Classified: 2026-04-12

Use Case 02 POET Technologies surged 45% - the platform traced it back to an AI infrastructure narrative that had been building for six weeks across 20+ stocks. Here's the full trail.

Anomaly screener showing POET flagged at z-score 11.1

Event log entry for POET with AI-generated summary

Narrative cards showing AI & Energy Infrastructure Buildout

Opened narrative with candlestick charts per stock

Step 01 / 04

Spot the anomaly

Every stock is scored daily against its own history. When something moves far outside its normal range, it rises to the top - filterable by sector and company size.

Ticker: POET
Z-score: 2.7
1-month return: +44.8%

Step 02 / 04

Read the event

Every anomalous move is logged with an AI-generated summary and keyword tags - capturing what drove it, and which narrative thread it belongs to. The trail stays readable weeks later.

Trigger: CFO disputes short-seller
Mechanism: Short squeeze + redomicile
Narrative link: AI & Energy Infra

Step 03 / 04

Surface the narrative

Co-moving stocks are clustered into evolving themes. POET joins 20+ tickers inside "AI & Energy Infrastructure Buildout" - a narrative Claude has versioned eight times since March.

Active tickers: 20+
Version: v8
Running since: 2026-03-09

Step 04 / 04

The full paper trail

Open any narrative to see the full thesis, candlestick charts for every member stock, and the event that pulled each one in - readable long after the move has passed. What the system shows isn't just that POET moved - it's that it moved as part of a theme that had been building for weeks. That context is the analysis.

Stock shown: POET Technologies
Z-score at entry: 2.7
Trigger: CFO disputes short-seller
Date: 2026-04-21

US stocks tracked

companies reclassified

Industry clusters

anomaly events logged

Risk-Based Classification

4,800+ companies reclassified by what they're actually exposed to - not what their official sector label says. AI reads regulatory filings to capture modern business models that standard industry codes miss.

Anomaly Detection

Every stock scored daily against its own history - unusual activity surfaces before it becomes obvious, across all 5,200 tickers at once.

SEC Filing Pipeline

Automated regulatory filing extraction across 4,800+ companies - classified into 15 sectors and 125 industry clusters using Claude AI, updated as new filings are published.

AI Narrative Tracking

Every unusual price move is logged with an AI-generated summary and linked to a narrative thread - so the context behind a move stays readable weeks after it happened, not just in the moment.

Serverless Architecture

44MB data payload compressed to 8MB gzip. Self-contained HTML dashboard with no runtime server dependency - instant load, fully offline-capable.

End-to-End Pipeline

From raw API data to interactive dashboard: Polygon.io OHLCV ingestion, split adjustment, parquet storage, SQLite databases, and Python-to-JS compilation.

Stack

Python Pandas NumPy SQLite Parquet JavaScript Polygon.io API SEC EDGAR API Anthropic Claude API Econometrics

P-02 — EARNINGS RESEARCH ENGINE

Earnings Research Engine

When the screener flags something, the question I actually want to answer isn't "what moved?" - I already know that. It's "what do I know about this company, and what is its peer group saying?" Answering that properly means reading: earnings releases, call transcripts, management commentary on the specific thing that matters - margins, bookings, certification milestones. I was doing that manually. The problem compounds when the question is cross-company: "what are battery companies saying about production ramp in Q1 2026?" isn't answerable from a screener.

This system changes that. Earnings releases and transcripts ingested into a vector database - retrieved, reranked, and synthesised by a 4-layer pipeline. The classifier from P-01 provides the routing layer: when a question arrives, the system finds the right cluster of companies first, then retrieves from accumulated filings for those companies specifically. Company knowledge compounds over time; a spike on BW today draws on every quarterly filing the system has seen.

Design decisions

Standard taxonomies (GICS, SIC) group companies by industry, not by what actually drives their earnings. HDBSCAN is density-based: the number of clusters and their boundaries emerge from the embedding structure of the 10-K text rather than a fixed schema. It also marks genuinely ambiguous companies as noise rather than forcing them into a nearest cluster - which matters for conglomerates and companies mid-pivot.

Searching every stored chunk for every query is expensive and floods the context with irrelevant companies. Each cluster is represented by the mean of its member embeddings (its centroid). At query time the question is compared against cluster centroids first - O(k) instead of O(n) - routing retrieval to the relevant sector of the corpus before any document-level search begins.

Embedding models are trained on documents, not questions. A query like "what drove ENPH's margin compression?" occupies a different region of the embedding space than the earnings passage that answers it. HyDE resolves this asymmetry: the LLM generates what a relevant earnings passage would plausibly contain, and that hypothetical document is embedded in place of the raw question. It lands close to real earnings chunks, improving recall from 88.9% to 100% company identification in testing.

Bi-encoder vector similarity is fast but approximate - query and document are embedded independently, so their interaction is never modelled. Cross-encoders read query and document together, which is more accurate but too slow to run against a full corpus. The pipeline uses the bi-encoder to retrieve a large candidate set, then re-scores with zerank-2 (a cross-encoder) before passing the top results to the LLM. This is where the gain from 33.3% to 41.9% relevant results came from.

TRY IT Pre-loaded responses from actual system runs

› █

← Select a query above to run the pipeline

RAG Spike Analysis Narrative Query Ticker Query

QUERY CONTEXT

TICKER

CLUSTER

213 — Industrial Boiler

QUERY

Q1 2026 bookings, backlog and revenue outlook

RETRIEVED

15 chunks (of 30)

RERANK

zerank-2

TOP SOURCES

Source	Type	Date	Score
BW Q1 2026 Earnings	earnings	2026-05-07	0.921
BW Q1 2026 Call	transcript	2026-05-07	0.904
BW Q4 2025 Earnings	earnings	2026-02-28	0.867
BW Q4 2025 Call	transcript	2026-02-28	0.851
BW Q3 2025 Earnings	earnings	2025-11-04	0.834

ANALYSIS OUTPUT

SIGNAL

Stock +12.4% on Q1 2026 earnings beat. Revenue $123.4M vs $118.2M expected; bookings $189M, backlog raised to $1.2B.

CATALYST

Base Electron technology order from European utility — first commercial deployment outside North America. Management raised full-year guidance.

PRIOR PATTERN

3 prior spikes on bookings announcements (Q3 2024, Q1 2025, Q3 2025). Pattern: fades 60–70% within 5 sessions, then holds above pre-spike level.

RISKS

Execution risk on Base Electron commercial ramp. Utility capex cycles can delay bookings recognition by 1–2 quarters.

RAG Spike Analysis Narrative Query Ticker Query

QUERY ROUTING

QUERY

What are battery companies saying about production ramp in Q1 2026?

CLUSTER HIT

91 — Next-Gen Battery

ROUTING SCORE

0.847

PEER GROUP — 12 COMPANIES

MVST — Microvast
QS — QuantumScape
SLDP — Solid Power
SES — SES AI Corp
FREYR — FREYR Battery
NKLA — Nikola

SYNTHESISED ANSWER

Battery companies in Q1 2026 showed divergent trajectories. MVST reported strongest progress: cell energy density reached 285 Wh/kg, guiding 2 GWh capacity by Q3. QS flagged yield improvement to 48% on solid-state separators but acknowledged commercial-scale validation remains 12–18 months out. SLDP cited customer qualification delays pushing first commercial revenue to late 2026.

SOURCES

MVST Q1 2026 (2026-05-12) · QS Q1 2026 (2026-04-28) · SLDP Q1 2026 (2026-05-06)

EVALUATION Systematic testing — from 30% to 42% relevant results

Every change was tested against 38 queries with known correct answers. The table shows what each technique added — and one case where adding more made things worse.

What was tested	Right companies found	Relevant results returned	vs. baseline
Baseline — no optimisation	88.9%	30.0%	—
+ re-ranking layer	88.9%	33.3%	+3.3%
+ query expansion via LLM	100.0%	38.9%	+8.9%
Both together ★ final system	100.0%	41.9%	+11.9%
Query expansion at two stages	100.0%	41.4%	−0.5%