Public GitHub Pages live

Data engineering, taught through shipped systems.

This repo starts with Python, schemas, storage, processing, and streaming. It moves into two serious builds: a trading platform and a retrieval lab that tags content, indexes it, and answers with citations. Then it goes somewhere most curricula never do: a research wing where fractals, indexing, lineage, and institutional governance run in working code.

Choose a path Explore the studios

84 notebooks

17 chapters

7 interactive studios

2 applied systems

Applied System A

Stock Trading Platform

Next.js, Flask, Hasura, Postgres, a Streamlit teaching surface, and market analysis through Ask Warren.

Applied System B

Retrieval Lab

Source adapters, content tagging, Chroma-backed retrieval, FastAPI answers, trace rails, and bounded agents.

Python · data modeling · Parquet · Kafka · GraphQL · embeddings · vector search · grounded answers · dbt · MDM · lineage · visibility graphs · box-covering · institutional theory · Hilbert curves · Z-order · HNSW · Hurst partitioning · Python · data modeling · Parquet · Kafka · GraphQL · embeddings · vector search · grounded answers · dbt · MDM · lineage · visibility graphs · box-covering · institutional theory · Hilbert curves · Z-order · HNSW · Hurst partitioning ·

Start here

Three ways into the repo.

Pick the surface that matches your attention span. Each path leads back into the same notebook spine.

Fastest payoff

Streamlit Dashboard

Open the teaching surface without Docker. Good if you want immediate interaction, market views, and a guided UI.

Open folder System view

Full Stack Platform

See the repo as a product: frontend, API, data model, orchestration, and the shape of a real application.

Open folder Modern capstone

Retrieval Lab

Go straight to content adapters, tagging, vector search, citations, and bounded agents.

Open folder

Curriculum arc

From foundations to systems to research.

The sequence is cumulative. The later chapters only make sense because the earlier ones teach the contracts underneath them.

00–05

Foundations

Python, data modeling, storage, processing, and streaming.

Browse notebooks

APIs and frontend

REST, GraphQL, Postgres, Hasura, and the first real product surface.

Open chapter

07–08

Embeddings and LLMs

Text comparison, embeddings, vector-store bridge notebooks, Anthropic setup, and LangChain-era tooling.

Open bridge

Quality and governance

Checks, dbt, and an MDM and governance primer that stabilizes the enterprise layer.

Open chapter

Retrieval systems and agents

A practical capstone: evaluate retrieval, blend lexical and vector signals, then ingest content, index it, retrieve it, and answer with evidence.

Open capstone

Fractals and governance

Mandelbrot intuition, fractal descriptors, pattern recognition, MDM, scale-sensitive governance, and duplicate-cluster risk.

Open studio

Fractal graphs

Time-series visibility graphs, box-covering on networks, and lineage as a stewardship object. Closes by naming the failure modes.

Open studio

Fractal governance

Institutional theory, fractal-graph descriptors, and AI governance braided into one chapter. Pressure fields, the decoupling lens, the regulation cascade.

Open studio

Fractal indexing

Hilbert and Z-order curves from scratch, the Faloutsos selectivity oracle, a pure-Python HNSW, a DuckDB Liquid Clustering benchmark, and Hurst-driven partitioning.

Open studio

Orchestration

A tiny asset-graph orchestrator from scratch: topological materialization, idempotent backfills, sensors and freshness, the repo's own dbt graph, retries, and the blast radius of a failure. Maps to Dagster; closes on the failure modes.

Open studio

Interactive studios

Seven labs that run in your browser.

No install, no notebook server. Each studio makes one hard idea visible at the speed of a click, then routes you back to the notebooks that built it.

Chapters 07–08

Embeddings Bridge

From raw text to vectors to retrieval, traced step by step.

Enter studio Chapter 10

Ranking Lab

Hybrid search and reranking, scored with MRR and NDCG.

Enter studio Chapter 11

Fractal Studio

Mandelbrot, box-counting, and duplicate-cluster stewardship.

Enter studio Chapter 12

Fractal Graphs

Visibility graphs, box-covering, and lineage blast radius.

Enter studio Chapter 13

Governance Studio

Pressure fields, decoupling, and the regulation cascade.

Enter studio Chapter 14

Indexing Studio

Hilbert curves, page pruning, and the HNSW climber.

Enter studio Chapter 15

Orchestration Studio

Materialize an asset graph, backfill partitions, trace a failure's blast radius.

Enter studio

Research wing

Where engineering meets institutional theory.

Chapters 11 through 14 are not a decorative detour. They take one mathematical lens, self-similarity across scale, and run it through four enterprise objects: entity clusters, lineage networks, governance regimes, and the indexes you ship every day. Apache Iceberg added Hilbert clustering in 2025; the math behind it is thirty years old. Each chapter closes with a notebook that names its own failure modes, because descriptors that cannot fail cannot inform.

Start at Chapter 11 Open the notebooks

Applied systems

Two builds, two different forms of rigor.

One teaches product architecture. The other teaches grounded retrieval. Both are concrete enough to expose weak assumptions.

System A

Stock Trading Platform

Next.js dashboard and Streamlit teaching layer
Flask API, Hasura GraphQL, Postgres, SQLite support
Ask Warren for market analysis and portfolio views

Frontend API Streamlit

System B

Retrieval Lab

Generic source adapter contract with NPS as the worked example
Normalization, rule-assisted and model-assisted tagging
Chroma retrieval, FastAPI answers, citations, and bounded tools

Lab code Docs Demo route

Run it

Fast ways to get your hands on it.

No abstraction here. These are the entry commands that matter.

Streamlit

git clone https://github.com/mhdk1602/python_training.git
cd python_training/streamlit-app
pip install -r requirements.txt
streamlit run app.py

Full stack

git clone https://github.com/mhdk1602/python_training.git
cd python_training
docker compose up -d

Chapter 10

cd python_training/chapter-10-rag-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
uvicorn main:app --reload --port 8001

Keep reading

The repo still matters.

This site is the front door. The real substance is still in the notebooks, the code, and the build decisions underneath them.

Read the README Open GitHub