GPU Calculator

Model size + quant → how much VRAM you need, which GPUs fit.

MODEL SIZE

QUANT (PRECISION)

CONTEXT (K TOKENS)

BATCH (CONCURRENT)

ESTIMATED VRAM NEEDED

estimated

Weights

KV cache

Overhead

RECOMMENDED GPUS

→ VRAM estimates are approximate. Actual usage varies ±10-30% depending on framework (vLLM, Transformers, llama.cpp), KV cache implementation, and optimizations like PagedAttention.