Transformer Journey
Research Papers About Me

    Core Concepts

  • Introduction
  • Distributed Setup
  • RMSNorm (Production)
  • RoPE (Fused)
  • Production Embedding
  • Tensor Parallel Linear
  • Fused QKV Projection
  • FlashAttention
  • SwiGLU Activation
  • Transformer Block
  • GPT-4 Initialization
  • Coming Soon

  • Mixed Precision
  • Causal Masking

Table of Contents

Follow my journey of building a production-grade transformer from scratch