Data engineering, taught through shipped systems.
This repo starts with Python, schemas, storage, processing, and streaming. It moves into two serious builds: a trading platform and a retrieval lab that tags content, indexes it, and answers with citations. It now also carries an advanced research lens on fractals, pattern recognition, MDM, and governance.
Applied System A
Stock Trading Platform
Next.js, Flask, Hasura, Postgres, a Streamlit teaching surface, and market analysis through Ask Warren.
Applied System B
Retrieval Lab
Source adapters, content tagging, Chroma-backed retrieval, FastAPI answers, trace rails, and bounded agents.
START HERE
Three ways into the repo.
Pick the surface that matches your attention span. Each path leads back into the same notebook spine.
Streamlit Dashboard
Open the teaching surface without Docker. Good if you want immediate interaction, market views, and a guided UI.
Open folder System viewFull Stack Platform
See the repo as a product: frontend, API, data model, orchestration, and the shape of a real application.
Open folder Modern capstoneRetrieval Lab
Go straight to content adapters, tagging, vector search, citations, and bounded agents.
Open folderCURRICULUM ARC
From foundations to systems.
The sequence is cumulative. The later chapters only make sense because the earlier ones teach the contracts underneath them.
Foundations
Python, data modeling, storage, processing, and streaming.
Browse notebooksAPIs and frontend
REST, GraphQL, Postgres, Hasura, and the first real product surface.
Open chapterEmbeddings and LLMs
Text comparison, embeddings, vector-store bridge notebooks, Anthropic setup, and LangChain-era tooling.
Open bridgeQuality and governance
Checks, dbt, and a new MDM and governance primer that stabilizes the enterprise layer.
Open chapterRetrieval systems and agents
A practical capstone: evaluate retrieval, blend lexical and vector signals, then ingest content, index it, retrieve it, and answer with evidence.
Open capstoneFractals and governance
An advanced cluster on Mandelbrot intuition, fractal descriptors, pattern recognition, MDM, scale-sensitive governance, and duplicate-cluster risk.
Open studioFractal graphs
Three working bridges: time-series visibility graphs, box-covering on networks, and lineage as a stewardship object. Closes with a notebook that names the failure modes.
Open studioFractal governance
Institutional theory, fractal-graph descriptors, and AI governance braided into one chapter. Pressure fields, the decoupling lens, the regulation cascade, and an AI parser with mock fallback. Nine notebooks; three labs; one capstone diagnostic.
Open studioFractal indexing
The indexes you ship every day are fractal. Hilbert and Z-order curves built from scratch, the Faloutsos selectivity oracle revived, a tiny pure-Python HNSW, a DuckDB Liquid Clustering benchmark, and Hurst-driven time partitioning. Nine notebooks; three labs; one workload-to-index decision tool.
Open studioNEW ADVANCED LENS
Fractals, pattern recognition, and governance.
This is not a decorative detour. It starts with Mandelbrot because recursion, scale, and boundary sensitivity are easier to see than to define. Then it moves into pattern recognition, master data management, data governance, and a duplicate-cluster case study where threshold shifts become stewardship decisions.
CHAPTER 12 STUDIO
Fractal graphs: from time series to networks to lineage.
Most enterprise objects worth governing are graphs already. Chapter 12 walks across three bridges: a visibility map from a time series to a network, box-covering as the graph analog of 11.1, and a lineage view that ranks stewardship priorities by structural blast radius. The closing notebook names the four failure modes so the descriptors do not get oversold.
CHAPTER 13 STUDIO
Fractal governance: pressure fields, AI mediation, and the visualization layer.
Three research streams (institutional theory, fractal-graph descriptors, AI governance) converge in code. Chapter 13 builds the apparatus and an interactive surface that lets you sketch a multi-scale pressure field, watch decoupling form between formal and operational signals, and edit a regulation cascade to see translation drift compound from regulator to practitioner. Closes with the four failure modes the apparatus must not hide.
CHAPTER 14 STUDIO
Fractal indexing: Hilbert, Z-order, and the hidden math of modern storage.
Apache Iceberg added Hilbert curve clustering in 2025. Delta Lake's Liquid Clustering reports up to 10x query acceleration. HNSW is a hierarchical small-world graph. Most engineers ship these indexes without seeing the curve trace through a grid. Chapter 14 fixes that. Three labs make the locality property, the page-pruning win, and the HNSW scale separation visible at the speed of a click. Closes with the four failure modes that turn a benchmark win into a 2 AM incident.
APPLIED SYSTEMS
Two builds, two different forms of rigor.
One teaches product architecture. The other teaches grounded retrieval. Both are concrete enough to expose weak assumptions.
System A
Stock Trading Platform
- Next.js dashboard and Streamlit teaching layer
- Flask API, Hasura GraphQL, Postgres, SQLite support
- Ask Warren for market analysis and portfolio views
System B
Retrieval Lab
- Generic source adapter contract with NPS as the worked example
- Normalization, rule-assisted and model-assisted tagging
- Chroma retrieval, FastAPI answers, citations, and bounded tools
RUN IT
Fast ways to get your hands on it.
No abstraction here. These are the entry commands that matter.
Streamlit
git clone https://github.com/mhdk1602/python_training.git
cd python_training/streamlit-app
pip install -r requirements.txt
streamlit run app.py
Full stack
git clone https://github.com/mhdk1602/python_training.git
cd python_training
docker compose up -d
Chapter 10
cd python_training/chapter-10-rag-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
uvicorn main:app --reload --port 8001
KEEP READING
The repo still matters.
This site is the front door. The real substance is still in the notebooks, the code, and the build decisions underneath them.