What Is FLOPS (Floating Point Operations Per Second)?
FLOPS (Floating Point Operations Per Second) is a unit of measurement for computing performance that counts how many floating-point arithmetic operations a processor can execute per second, serving as the primary metric for comparing AI training and inference hardware capabilities.
How FLOPS (Floating Point Operations Per Second) Works
Neural network training and inference consist primarily of floating-point operations — multiplications and additions on decimal numbers in matrix computations. FLOPS measures how fast hardware can perform these operations. Modern AI GPUs operate in the teraFLOPS (10^12) to petaFLOPS (10^15) range. An NVIDIA H100 delivers about 1,979 teraFLOPS of FP8 performance. The total FLOPS required for training is a useful metric for comparing model scales: GPT-3 required approximately 3.14 x 10^23 FLOPS to train, while GPT-4 is estimated to have required 10-100x more. Understanding FLOPS helps estimate training time, hardware costs, and model scaling behavior.
Real-World Examples
NVIDIA's H100 GPU delivering approximately 2 petaFLOPS of FP8 AI performance for training workloads
Researchers estimating that training GPT-4 required roughly 2 x 10^25 FLOPS of compute over several months
A team calculating they need 10^21 FLOPS for their training run and provisioning the right number of GPUs accordingly