CHAPTER 14 STUDIO

The indexes you ship every day are fractal. Watch the math draw.

Apache Iceberg added Hilbert curve clustering in 2025. Delta Lake's Liquid Clustering reports up to 10x query acceleration. Uber's H3 and Google's S2 are hierarchical fractal subdivisions. HNSW is a hierarchical small-world graph. Most engineers who use these indexes have never seen the curve trace through a grid. This page fixes that. Three labs make the locality property, the page-pruning win, and the HNSW scale separation visible at the speed of a click.

Open the labs Open chapter folder

9 notebooks in the cluster

3 interactive labs

10x production speedup target

Why the curves matter

Locality is a fractal property.

A space-filling curve maps the 2D grid to a 1D order. Hilbert and Z-order are both fractals: the curve at scale 2N is built by recursively placing four scaled copies of the curve at scale N. That recursion is what gives them their locality. Two cells that are close in the grid land close on the curve, on average. Storage engines cash that property in as fewer pages read per query.

Why HNSW belongs here

The vector index is a small-world graph.

HNSW assigns each point to a layer with exponentially decaying probability. The result is a hierarchy with fractal degree distribution and logarithmic search complexity. The same Watts-Strogatz, Barabasi, and Song-Havlin-Makse machinery applies. Vector databases ship HNSW as a black box; this lab opens it.

LAB 1: CURVE TRACE ANIMATOR

Watch a Hilbert or Z-order curve fill a 32 x 32 grid. Drag the query rectangle.

Pick a curve. Pick a speed. Drag the green rectangle anywhere on the grid. The page-fetch counters update in real time and show how many "tiles" each ordering forces a query to read. Hilbert wins on average; the gap is bigger for small queries.

Drag the green rectangle. Click PLAY to retrace. ready

Curve type

Animation speed

14 cells per second

Pages touched (this curve) 0

Pages touched (row-major) 0

Pages saved vs row-major 0%

Each "page" holds 16 contiguous curve cells (a 16-cell run). Production engines tune this to thousands of rows; the savings ratio is the same.

LAB 2: INDEX RACE TRACK

5,000 random points. Four indexes. One query box. Watch them race.

Drag the query rectangle anywhere on the canvas. The bar chart on the right updates instantly with pages-fetched and points-returned for each ordering. Use the skew slider to redistribute the points; Hilbert and Z-order pull farther ahead as skew increases.

Drag the green box. Click REGENERATE to draw new points. ready

Distribution skew

60 on a 0 (uniform) to 100 (heavy clusters) scale

Points in query box 0

Best index for this query --

LAB 3: HNSW CLIMBER

Watch a search descend through the layers.

Click anywhere on the canvas to drop a query point. The HNSW search starts at the top layer and descends one layer per step. Visited nodes flash. The visit-count panel on the right shows why the descent is logarithmic instead of linear. Use the M slider to rebuild the graph with denser or sparser connections.

Click anywhere to drop a query. ready

Graph degree M

M = 5 (production HNSW defaults to 16)

Total points

80 points across all layers

Layers built 0

Visited nodes 0

Speedup vs brute force --

Brute force checks every point. HNSW checks roughly log(N) per layer. The speedup grows with N.

CHAPTER 14 NOTEBOOK PATH

Each lab has notebooks behind it.

The studio is the surface. The chapter is where the algorithms are built and the failure modes are named.

14.0

Why indexes are already fractal

Frames the chapter. Three production examples (Iceberg, Liquid Clustering, HNSW). One-paragraph history.

14.1 - 14.2

Curves, locality, and Hilbert R-tree

Build Z-order and Hilbert in NumPy. Reproduce Kamel-Faloutsos (1994) Hilbert R-tree bulk loading.

14.3

Fractal dimension as a selectivity oracle

Reproduce the 1994 Faloutsos selectivity formula. Beat the uniform predictor by an order of magnitude on skewed data.

14.4

HNSW as a small-world index

Pure-Python HNSW. Visualize layer assignment. Connect to Watts-Strogatz, Barabasi, Song-Havlin-Makse.

14.5 - 14.6

DuckDB clustering and Hurst-aware partitioning

Run Liquid Clustering at home. Drive time-series chunk boundaries by the Hurst exponent of the stream.

14.7 - 14.8

Capstone and the failure modes

Recommend an index for your workload. Then read the four failure modes before you ship anything.