Open to work · Copenhagen

Economist. Data Analyst.

I spent a decade reading markets.

Then I built the tools to analyse them properly.

A decade of trading teaches you to be wrong quickly and update. That turns out to be the same discipline that makes data analysis useful rather than decorative - knowing which question is worth asking, holding it lightly, and changing your view when the data says to.

MSc Economics from the University of Copenhagen (2023-2025), following a BSc in Economics and ten years as an independent trader and analyst. Financial markets are the domain I built in; the underlying skills transfer wherever data is complex and voluminous.

55.6761° N, 12.5683° E - Copenhagen, DK

Market Intelligence Platform

Every data system has a label problem. Official categories describe what a company was, not what it's exposed to right now. A gaming company pivots its treasury to Ethereum - it still says "Electronic Gaming" in every database. A photonics maker, a memory chip supplier, a data-centre cooling firm, and a power equipment company all start moving together - placed in different sectors by every standard classification scheme, driven by the same underlying bet on AI infrastructure. Industry labels like GICS and SIC are slow to change by design. But risk doesn't wait. When a company's actual exposure diverges from its label, standard tools miss it - and so does anyone relying on them. The platform is built around one question: what is this company actually, right now? 5,200 US stocks reclassified by real risk exposure - extracted from regulatory filings by AI - with unusual price moves logged daily and processed to surface the narratives that explain why stocks move together, and when that changes.

Data Sources
Polygon.io market data · 5,200 tickers daily
SEC EDGAR 10-K filings · 4,800+ companies
Claude API classification + narrative analysis
Pipeline
Market Data Ingestion fetch · split-adjust · parquet storage
Risk Classification 15 sectors · 125 clusters · AI on filings
Anomaly Detection rolling z-scores · full universe daily
Narrative Engine event log · AI-powered clustering
Output
Anomaly Screener flags unusual moves across all stocks
Research Dashboard serverless · no runtime server
Narrative Reports AI-clustered cross-market themes
Platform in action - click a use case to explore
Use Case 01 GameSquare: classified as an esports company - but its Ethereum treasury means it moves with crypto, not gaming. The platform surfaces both exposures and flags the uncertainty.
GAME in platform screener showing classification panel with ETH price driver
Raw classification database showing GAME confidence 0.72 and price drivers
Step 01 / 02

The platform view

GameSquare is labelled an esports gaming company - reasonable on the surface. But the platform read its regulatory filing and found something the label misses: the company holds Ethereum as a treasury asset, meaning its stock moves with crypto prices, not just gaming ad spend. No standard classification captures both at once.

Ticker
GAME
Sector
media_and_entertainment
Industry
esports gaming
Key driver
ETH price
Step 02 / 02

What the system extracted

The raw classification data shows confidence at 0.72 - the system flagging that this company doesn't fit cleanly into one category. The price drivers were extracted directly from Item 1A of the 10-K filing. The rationale explains why: dual exposure to both gaming fundamentals and ETH price complicates any single label.

Confidence
0.72
Source
10-K Item 1A
Signal
dual exposure
Classified
2026-04-12
Use Case 02 POET Technologies surged 45% - the platform traced it back to an AI infrastructure narrative that had been building for six weeks across 20+ stocks. Here's the full trail.
Anomaly screener showing POET flagged at z-score 11.1
Event log entry for POET with AI-generated summary
Narrative cards showing AI & Energy Infrastructure Buildout
Opened narrative with candlestick charts per stock
Step 01 / 04

Spot the anomaly

Every stock is scored daily against its own history. When something moves far outside its normal range, it rises to the top - filterable by sector and company size.

Ticker
POET
Z-score
2.7
1-month return
+44.8%
Step 02 / 04

Read the event

Every anomalous move is logged with an AI-generated summary and keyword tags - capturing what drove it, and which narrative thread it belongs to. The trail stays readable weeks later.

Trigger
CFO disputes short-seller
Mechanism
Short squeeze + redomicile
Narrative link
AI & Energy Infra
Step 03 / 04

Surface the narrative

Co-moving stocks are clustered into evolving themes. POET joins 20+ tickers inside "AI & Energy Infrastructure Buildout" - a narrative Claude has versioned eight times since March.

Active tickers
20+
Version
v8
Running since
2026-03-09
Step 04 / 04

The full paper trail

Open any narrative to see the full thesis, candlestick charts for every member stock, and the event that pulled each one in - readable long after the move has passed. What the system shows isn't just that POET moved - it's that it moved as part of a theme that had been building for weeks. That context is the analysis.

Stock shown
POET Technologies
Z-score at entry
2.7
Trigger
CFO disputes short-seller
Date
2026-04-21
0
US stocks tracked
0
companies reclassified
0
Industry clusters
0
anomaly events logged
Risk-Based Classification
4,800+ companies reclassified by what they're actually exposed to - not what their official sector label says. AI reads regulatory filings to capture modern business models that standard industry codes miss.
Anomaly Detection
Every stock scored daily against its own history - unusual activity surfaces before it becomes obvious, across all 5,200 tickers at once.
SEC Filing Pipeline
Automated regulatory filing extraction across 4,800+ companies - classified into 15 sectors and 125 industry clusters using Claude AI, updated as new filings are published.
AI Narrative Tracking
Every unusual price move is logged with an AI-generated summary and linked to a narrative thread - so the context behind a move stays readable weeks after it happened, not just in the moment.
Serverless Architecture
44MB data payload compressed to 8MB gzip. Self-contained HTML dashboard with no runtime server dependency - instant load, fully offline-capable.
End-to-End Pipeline
From raw API data to interactive dashboard: Polygon.io OHLCV ingestion, split adjustment, parquet storage, SQLite databases, and Python-to-JS compilation.
Stack
Python Pandas NumPy SQLite Parquet JavaScript Polygon.io API SEC EDGAR API Anthropic Claude API Econometrics
P-02 — EARNINGS RESEARCH ENGINE

Earnings Research Engine

When the screener flags something, the question I actually want to answer isn't "what moved?" - I already know that. It's "what do I know about this company, and what is its peer group saying?" Answering that properly means reading: earnings releases, call transcripts, management commentary on the specific thing that matters - margins, bookings, certification milestones. I was doing that manually. The problem compounds when the question is cross-company: "what are battery companies saying about production ramp in Q1 2026?" isn't answerable from a screener.

This system changes that. Earnings releases and transcripts ingested into a vector database - retrieved, reranked, and synthesised by a 4-layer pipeline. The classifier from P-01 provides the routing layer: when a question arrives, the system finds the right cluster of companies first, then retrieves from accumulated filings for those companies specifically. Company knowledge compounds over time; a spike on BW today draws on every quarterly filing the system has seen.

RAG — INDEX SEC 10-K 4,685 annual filings zembed-1 mean-pool embedding HDBSCAN density clustering company_classification vector index · ChromaDB sector · cluster · price drivers one embedding per company cluster centroids stored trading_research vector index · ChromaDB earnings · transcripts one embedding per chunk ticker · date · doc_type SEC 8-K earnings + transcripts chunk + zembed-1 paragraph-level embedding RAG — RETRIEVE & GENERATE USER QUERY narrative mode ticker mode S1 · CENTROID ROUTING cosine vs cluster centroids keyword label boost S2 · COMPANY POOL tickers from matched clusters S3 · SEMANTIC RETRIEVAL HyDE: LLM generates query doc embed → search trading_research filtered to S2 ticker set returns top-30 chunks S4 · ZERANK-2 cross-encoder reranking 30 chunks → top 15 full query-chunk interaction relevance-ordered for LLM LLM SYNTHESIS 15 chunks in context answers from context only cites sources [1][2] narrative mode: cross-ticker Q&A ticker mode: single-company Q&A OUTPUT structured analysis
Design decisions
Standard taxonomies (GICS, SIC) group companies by industry, not by what actually drives their earnings. HDBSCAN is density-based: the number of clusters and their boundaries emerge from the embedding structure of the 10-K text rather than a fixed schema. It also marks genuinely ambiguous companies as noise rather than forcing them into a nearest cluster - which matters for conglomerates and companies mid-pivot.
Searching every stored chunk for every query is expensive and floods the context with irrelevant companies. Each cluster is represented by the mean of its member embeddings (its centroid). At query time the question is compared against cluster centroids first - O(k) instead of O(n) - routing retrieval to the relevant sector of the corpus before any document-level search begins.
Embedding models are trained on documents, not questions. A query like "what drove ENPH's margin compression?" occupies a different region of the embedding space than the earnings passage that answers it. HyDE resolves this asymmetry: the LLM generates what a relevant earnings passage would plausibly contain, and that hypothetical document is embedded in place of the raw question. It lands close to real earnings chunks, improving recall from 88.9% to 100% company identification in testing.
Bi-encoder vector similarity is fast but approximate - query and document are embedded independently, so their interaction is never modelled. Cross-encoders read query and document together, which is more accurate but too slow to run against a full corpus. The pipeline uses the bi-encoder to retrieve a large candidate set, then re-scores with zerank-2 (a cross-encoder) before passing the top results to the LLM. This is where the gain from 33.3% to 41.9% relevant results came from.
TRY IT Pre-loaded responses from actual system runs
← Select a query above to run the pipeline
Systematic testing — from 30% to 42% relevant results

Every change was tested against 38 queries with known correct answers. The table shows what each technique added — and one case where adding more made things worse.

What was tested Right companies found Relevant results returned vs. baseline
Baseline — no optimisation 88.9% 30.0%
+ re-ranking layer 88.9% 33.3% +3.3%
+ query expansion via LLM 100.0% 38.9% +8.9%
Both together ★ final system 100.0% 41.9% +11.9%
Query expansion at two stages 100.0% 41.4% −0.5%
Stack
Python ChromaDB zembed-1 zerank-2 HDBSCAN Claude API Flask SEC EDGAR API HyDE RAG

Let's talk.

Spent a decade developing analytical depth independently - then formalised it with an MSc. Now looking for a role where that depth gets sharpened through collaboration and insights actually reach the people making decisions.