← back to .plan

Villa Straylight Papers - Part IV: Take Your Word, Thief

Weyl Team 4 min read
// CUDA //// GPU //// Architecture //// Formal Methods //// Lean //// razorgirl //// Composition //
← Part III: Built Him up From Nothing//← Back to Introduction

Part IV: Take Your Word, Thief

Composition, the Tensor Core Cathedral, and Jensen’s Razor

Composition is the final piece. It’s how you stack layouts:

Each layer is a layout transformation. Composing them gives you the full path from DRAM to silicon.


The algebra of composition

Given layouts A : Coord(S_A) → Offset and B : Coord(S_B) → Offset:

Composition: (B ∘ A)(c) = B(A(c))

But this only works if:

  1. A’s image is contained in B’s coordinate space
  2. The divisibility constraints are preserved

NVIDIA’s documentation introduces LeftDivisible as the admissibility predicate:

def LeftDivisible (A B : Layout) : Prop :=
∀ i, B.modes[i].stride ∣ (A.cosize * B.modes[i].stride)

This ensures that composing A then B doesn’t violate the tiling constraints at each level.


Why admissibility must be explicit

Consider stacking:

Each level has divisibility requirements. If your warp tile isn’t divisible by your thread tile, you get:


Lean 4: typed composition

def compose (A B : Layout) (h : LeftDivisible A B) : Layout :=
sorry -- construction via CuTe's composition rules
theorem compose_sound (A B : Layout) (h : LeftDivisible A B) :
∀ c, (compose A B h).eval c = B.eval (A.eval c) :=
sorry -- proof that composition preserves semantics

The key: You cannot compose layouts without proving LeftDivisible.


Jensen’s Razor (Reprise)

Never attribute to search what can be proven by construction.

NVIDIA gives you:

We give you:


The Blade

-- This compiles:
def validKernel : CUDAKernel :=
let globalLayout := (128, 128) : (128, 1)
let smemLayout := (16, 8) : (8, 1) -- proof: 16*8128*128
let regLayout := (16, 16) : (16, 1) -- proof: 16*1616*8
compile (compose (compose globalLayout smemLayout ?h1) regLayout ?h2)
-- This doesn't:
def invalidKernel : CUDAKernel :=
let globalLayout := (128, 128) : (128, 1)
let smemLayout := (17, 7) : (7, 1) -- Error: 17*7128*128
compile (compose globalLayout smemLayout ?proof)
-- ^^^^^^
-- failed to synthesize

Coda

“Take your word, thief.” He jacked.


Appendix: Key Documents Referenced

The nvfuser documentation studied includes:

All BSD-3-Clause licensed. All waiting to be encoded as types.



← Part III: Built Him up From Nothing//← Back to Introduction