LLM Inference Throughput

Relative tokens per second on Mistral 7B. Half precision (FP16). The more, the better.

H100 SXM5 80GB ($2.12)

H100 PCIE 80GB ($1.99)

A100 SXM4 80GB ($1.65)

A100 PCIE 80GB ($1.63)

RTX 6000 Ada 48GB ($1.07)

L40 48GB ($1.07)

RTX 4090 24GB ($0.41)

RTX 3090 24GB ($0.21)

RTX A6000 48GB ($0.47)

V100 32GB ($0.42)

Quadro 8000 48GB ($0.32)

Our Observations: For the smallest models, the GeForce RTX and Ada cards with 24 GB of VRAM are the most cost effective. For slightly larger models, the RTX 6000 Ada and L40 are the most cost effective, but if your model is larger than 48GB, the H100 provides the best price to performance ratio as well as the best raw performance.

Deep Learning Training Speed

Relative iterations per second training a Resnet-50 CNN on the CIFAR-10 dataset. The more, the better.

H100 SXM5 80GB ($2.12)

H100 PCIE 80GB ($1.99)

A100 SXM4 80GB ($1.65)

A100 PCIE 80GB ($1.63)

RTX 6000 Ada ($1.07)

RTX 4090 24GB ($0.43)

L40 48GB ($1.07)

RTX 3090 24GB ($0.21)

RTX A6000 48GB ($0.47)

V100 32GB ($0.42)

Quadro 8000 48GB ($0.32)

Our Observations: For training, nothing beats the H100 and A100 GPUs. Machine learning-optimized performance coupled with the incredible 80GB of VRAM make both a compelling choice. Deploy 8x SXMs when available to take full advantage of parallelism.

LLM Batch Latency

Time taken to process one batch of tokens, p90, Mistral 7B. Half precision (FP16). The lower, the better.

H100 SXM5 80GB ($2.12)

H100 PCIE 80GB ($1.99)

A100 SXM4 80GB ($1.65)

A100 PCIE 80GB ($1.63)

RTX 4090 24GB ($0.43)

L40 48GB ($1.07)

RTX 3090 24GB ($0.21)

RTX 6000 Ada 48GB ($1.07)

RTX A6000 48GB ($0.47)

V100 32GB ($0.42)

Quadro 8000 48GB ($0.32)

Our Observations: LLM latency matters. The slower your model responds, the more likely you are to churn a customer. The H100s and A100s are the best performers, but the Ada and RTX cards are much more cost effective if your model doesn't need the full 80GB of VRAM.

Machine Learning
GPU Benchmarks

LLM Inference Throughput

Deep Learning Training Speed

LLM Batch Latency

Deploy your first TensorDock server.

GPU Cloud

CPU Cloud

Industry-leading pricing.

Company

Resources

Industry-leading pricing.

Hyperscalers

GPU Clouds

Marketplaces

Machine Learning GPU Benchmarks

LLM Inference Throughput

Deep Learning Training Speed

LLM Batch Latency

Deploy your first TensorDock server.

Machine Learning
GPU Benchmarks