PUBLIC GITHUB PAGES live from a public repo

Data engineering, taught through shipped systems.

This repo starts with Python, schemas, storage, processing, and streaming. It moves into two serious builds: a trading platform and a retrieval lab that tags content, indexes it, and answers with citations. It now also carries an advanced research lens on fractals, pattern recognition, MDM, and governance.

76 notebooks
2 applied systems
4 research studios

Applied System A

Stock Trading Platform

Next.js, Flask, Hasura, Postgres, a Streamlit teaching surface, and market analysis through Ask Warren.

Applied System B

Retrieval Lab

Source adapters, content tagging, Chroma-backed retrieval, FastAPI answers, trace rails, and bounded agents.

START HERE

Three ways into the repo.

Pick the surface that matches your attention span. Each path leads back into the same notebook spine.

Streamlit Dashboard

Open the teaching surface without Docker. Good if you want immediate interaction, market views, and a guided UI.

Open folder

Full Stack Platform

See the repo as a product: frontend, API, data model, orchestration, and the shape of a real application.

Open folder

Retrieval Lab

Go straight to content adapters, tagging, vector search, citations, and bounded agents.

Open folder

CURRICULUM ARC

From foundations to systems.

The sequence is cumulative. The later chapters only make sense because the earlier ones teach the contracts underneath them.

00–05

Foundations

Python, data modeling, storage, processing, and streaming.

Browse notebooks
06

APIs and frontend

REST, GraphQL, Postgres, Hasura, and the first real product surface.

Open chapter
07–08

Embeddings and LLMs

Text comparison, embeddings, vector-store bridge notebooks, Anthropic setup, and LangChain-era tooling.

Open bridge
09

Quality and governance

Checks, dbt, and a new MDM and governance primer that stabilizes the enterprise layer.

Open chapter

NEW ADVANCED LENS

Fractals, pattern recognition, and governance.

This is not a decorative detour. It starts with Mandelbrot because recursion, scale, and boundary sensitivity are easier to see than to define. Then it moves into pattern recognition, master data management, data governance, and a duplicate-cluster case study where threshold shifts become stewardship decisions.

CHAPTER 12 STUDIO

Fractal graphs: from time series to networks to lineage.

Most enterprise objects worth governing are graphs already. Chapter 12 walks across three bridges: a visibility map from a time series to a network, box-covering as the graph analog of 11.1, and a lineage view that ranks stewardship priorities by structural blast radius. The closing notebook names the four failure modes so the descriptors do not get oversold.

CHAPTER 13 STUDIO

Fractal governance: pressure fields, AI mediation, and the visualization layer.

Three research streams (institutional theory, fractal-graph descriptors, AI governance) converge in code. Chapter 13 builds the apparatus and an interactive surface that lets you sketch a multi-scale pressure field, watch decoupling form between formal and operational signals, and edit a regulation cascade to see translation drift compound from regulator to practitioner. Closes with the four failure modes the apparatus must not hide.

CHAPTER 14 STUDIO

Fractal indexing: Hilbert, Z-order, and the hidden math of modern storage.

Apache Iceberg added Hilbert curve clustering in 2025. Delta Lake's Liquid Clustering reports up to 10x query acceleration. HNSW is a hierarchical small-world graph. Most engineers ship these indexes without seeing the curve trace through a grid. Chapter 14 fixes that. Three labs make the locality property, the page-pruning win, and the HNSW scale separation visible at the speed of a click. Closes with the four failure modes that turn a benchmark win into a 2 AM incident.

APPLIED SYSTEMS

Two builds, two different forms of rigor.

One teaches product architecture. The other teaches grounded retrieval. Both are concrete enough to expose weak assumptions.

System A

Stock Trading Platform

  • Next.js dashboard and Streamlit teaching layer
  • Flask API, Hasura GraphQL, Postgres, SQLite support
  • Ask Warren for market analysis and portfolio views

System B

Retrieval Lab

  • Generic source adapter contract with NPS as the worked example
  • Normalization, rule-assisted and model-assisted tagging
  • Chroma retrieval, FastAPI answers, citations, and bounded tools

RUN IT

Fast ways to get your hands on it.

No abstraction here. These are the entry commands that matter.

Streamlit

git clone https://github.com/mhdk1602/python_training.git
cd python_training/streamlit-app
pip install -r requirements.txt
streamlit run app.py

Full stack

git clone https://github.com/mhdk1602/python_training.git
cd python_training
docker compose up -d

Chapter 10

cd python_training/chapter-10-rag-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
uvicorn main:app --reload --port 8001

KEEP READING

The repo still matters.

This site is the front door. The real substance is still in the notebooks, the code, and the build decisions underneath them.