Transformer Journey
Research Papers
About Me
Core Concepts
Introduction
Distributed Setup
RMSNorm (Production)
RoPE (Fused)
Production Embedding
Tensor Parallel Linear
Fused QKV Projection
FlashAttention
SwiGLU Activation
Transformer Block
GPT-4 Initialization
Coming Soon
Mixed Precision
Causal Masking