// what it does

A neural network framework built entirely in NumPy. BERT and GPT architectures from scratch, including multi-head attention, positional encoding, and custom training loops. CuPy for GPU acceleration where needed.

// why I built this

I wanted "I debugged the gradient flow" understanding, not "I read the paper" understanding. When you build a transformer from raw matrix operations, you learn what attention actually computes. This was a learning project that got out of hand in the best way.

// stack

NumPy, CuPy, pure Python. No PyTorch, no TensorFlow, no shortcuts.