ArthurChiao's Blog

GPU Performance (Data Sheets) Quick Reference (2023)

Published at 2023-10-25 | Last Update 2023-10-25

This post provides a concise reference for the performance of popular GPU models from NVIDIA and Huawei/HiSilicon, primarily intended for personal use.



1 Introduction

Naming convention of NVIDIA GPUs

The first letter in GPU model names denote their GPU architectures, with:

  1. T for Turing;
  2. A for Ampere;
  3. V for Volta;
  4. H for Hopper; 2022
  5. L for Ada Lovelace;

2 Comparison of T4/A10/A10G/V100

  T4 A10 A10G A30 V100 PCIe/SMX2
Designed for Data center workloads (Desktop) Graphics-intensive workloads Desktop Desktop Data center
Year 2018 2020     2017
Manufacturing 12nm 12nm 12nm    
Architecture Turing Ampere Ampere Ampere Volta
Max Power 70 watts 150 watts   165 watts 250/300watts
GPU Mem 16GB GDDR6 24GB GDDR6 48GB GDDR6 24GB HBM2 16/32GB HBM2
GPU Mem BW 400 GB/s 600 GB/s   933GB/s 900 GB/s
Interconnect PCIe Gen3 32GB/s PCIe Gen4 66 GB/s   PCIe Gen4 64GB/s, NVLINK 200GB/s PCIe Gen3 32GB/s, NVLINK 300GB/s
FP32 8.1 TFLOPS 31.2 TFLOPS   10.3TFLOPS 14/15.7 TFLOPS
BFLOAT16 TensorCore   125 TFLOPS   165 TFLOPS  
FP16 TensorCore   125 TFLOPS   165 TFLOPS  
INT8 TensorCore   250 TFLOPS   330 TOPS  
INT4 TensorCore       661 TOPS  

Datasheets:

  1. T4
  2. A10
  3. A30
  4. V100-PCIe/V100-SXM2/V100S-PCIe

3 Comparison of A100/A800/H100/H800/Ascend 910B

  A800 (PCIe/SXM) A100 (PCIe/SXM) Huawei Ascend 910B H800 (PCIe/SXM) H100 (PCIe/SXM)
Year 2022 2020 2023 2022 2022
Manufacturing 7nm 7nm 7+nm 4nm 4nm
Architecture Ampere Ampere HUAWEI Da Vinci Hopper Hopper
Max Power 300/400 watt 300/400 watt 400 watt   350/700 watt
GPU Mem 80G HBM2e 80G HBM2e 64G HBM2e 80G HBM3 80G HBM3
GPU Mem BW   1935/2039 GB/s     2/3.35 TB/s
Interconnect NVLINK 400GB/s PCIe Gen4 64GB/s, NVLINK 600GB/s HCCS 392GB/s NVLINK 400GB/s PCIe Gen5 128GB/s, NVLINK 900GB/s
FP32   19.5 TFLOPS     51/67 TFLOPS
TF32 (TensorFloat)   156/312 TFLOPS     756/989 TFLOPS
BFLOAT16 TensorCore   156/312 TFLOPS      
FP16 TensorCore   312/624 TFLOPS 320 TFLOPS   1513/1979 TFLOPS
FP8 TensorCore NOT support NOT support     3026/3958 TFLOPS
INT8 TensorCore   624/1248 TFLOPS 640 TFLOPS   3026/3958 TFLOPS

H100 vs. A100 in one word: 3x performance, 2x price.

Datasheets:

  1. A100
  2. H100
  3. Huawei Ascend-910