Compare prices and performance across a dozen GPUs.
Find the best GPU for your workload.
Relative tokens per second on Mistral 7B. Half precision (FP16). The more, the better.
H100 SXM5 80GB ($2.12)
H100 PCIE 80GB ($1.99)
A100 SXM4 80GB ($1.65)
A100 PCIE 80GB ($1.63)
RTX 6000 Ada 48GB ($1.07)
L40 48GB ($1.07)
RTX 4090 24GB ($0.41)
RTX 3090 24GB ($0.21)
RTX A6000 48GB ($0.47)
V100 32GB ($0.42)
Quadro 8000 48GB ($0.32)
Our Observations: For the smallest models, the GeForce RTX and Ada cards with 24 GB of VRAM are the most cost effective. For slightly larger models, the RTX 6000 Ada and L40 are the most cost effective, but if your model is larger than 48GB, the H100 provides the best price to performance ratio as well as the best raw performance.
Relative iterations per second training a Resnet-50 CNN on the CIFAR-10 dataset. The more, the better.
H100 SXM5 80GB ($2.12)
H100 PCIE 80GB ($1.99)
A100 SXM4 80GB ($1.65)
A100 PCIE 80GB ($1.63)
RTX 6000 Ada ($1.07)
RTX 4090 24GB ($0.43)
L40 48GB ($1.07)
RTX 3090 24GB ($0.21)
RTX A6000 48GB ($0.47)
V100 32GB ($0.42)
Quadro 8000 48GB ($0.32)
Our Observations: For training, nothing beats the H100 and A100 GPUs. Machine learning-optimized performance coupled with the incredible 80GB of VRAM make both a compelling choice. Deploy 8x SXMs when available to take full advantage of parallelism.
Time taken to process one batch of tokens, p90, Mistral 7B. Half precision (FP16). The lower, the better.
H100 SXM5 80GB ($2.12)
H100 PCIE 80GB ($1.99)
A100 SXM4 80GB ($1.65)
A100 PCIE 80GB ($1.63)
RTX 4090 24GB ($0.43)
L40 48GB ($1.07)
RTX 3090 24GB ($0.21)
RTX 6000 Ada 48GB ($1.07)
RTX A6000 48GB ($0.47)
V100 32GB ($0.42)
Quadro 8000 48GB ($0.32)
Our Observations: LLM latency matters. The slower your model responds, the more likely you are to churn a customer. The H100s and A100s are the best performers, but the Ada and RTX cards are much more cost effective if your model doesn't need the full 80GB of VRAM.
The industry's most cost-effective GPU cloud