AEWIN

Comparison of Previous Generation of Nvidia GPU A30 vs T4

social_icon_fb social_icon_twitter social_icon_line social_icon_line

Nvidia just released their latest GPU with new architecture of Hopper and Ada Lovelace this Fall. Before we are getting deeper to know further details of the latest technologies, let us revisit the footprint of Nvidia previous microarchitectures.

Back in 2018, NVIDIA released an energy efficient T4 with revolutionary multi-precision performance which can offer extraordinary performance for edge inference. Three years later, A30 is in the market as one of the most versatile mainstream compute GPU for AI Inference. Let us figure out what is the difference between them.
Here is a brief comparison table with some specs without the performance results.

A30 T4
Process Technology 7/8nm 14nm
PCIe Gen4 x16 Gen3 x16
TDP (Max thermal design power) 165W 70W
Size (mm) 268 x 111
(FHFL)
170 x 69
(HHHL/low profile)
Socket Width Double Single

With the improvements of the process technology and PCIe speed, the performance of the A30 is expected to have a great leap forward which is align with the actual results. With bigger TDP and the larger size, the platform of supporting the GPUs shall be evolved along with it and up to date AEWIN platforms supporting both A30 and T4 are listed in the end of this tech blog for your reference.
In addition to the technology growth mentioned, NVIDIA has some innovations in A30 including the Architecture Tensor Core Technology.
Architecture Tensor Core Technology: Ampere vs Turing

Extra Enhancement Ampere Tuning
Multi-Precision Computing FP64, TF32, FP32, BF16, FP16,
INT8, INT4
FP32, FP16,
INT8, INT4

T4 is powered by NVIDIA Turing Tensor Cores delivering revolutionary multi-precision performance (FP32, FP16, INT8, and INT4) to accelerate a wide range of modern applications, including machine learning, deep learning, and virtual desktops. In addition to the previous precisions, A30 is powered by NVIDIA Ampere Tensor Cores technology, supporting innovations including Tensor Float 32 (TF32), BFloat16 (BF16), and higher performance of the double-precision FP64. In addition to TF32 and BF16, A30 is with the new Multi-Instance GPU (MIG), let us take a closer look at them.

Tensor Float 32
TF32 is the math mode for handling the matrix math for AI/HPC applications. As shown in the following illustration, TF32 uses the same 10-bit mantissa as the FP16 math and adopts the same 8-bit exponent as FP32 to support the larger numeric range and more than sufficient margin for the precision requirements of AI workloads. Regarding TF32 deep learning performance, A30 deliver up to 10X higher performance over the NVIDIA T4 with zero code changes.

BFloat16
As for BF16, like we mentioned in our previous Tech Blog that it is essentially a FP32 with truncated significand bringing the performance of FP16 with the dynamic range of FP32 while using half the memory. With the reduced memory bandwidth, faster execution is permitted.

Multi-Instance GPU
The new feature of Multi-Instance GPU (MIG) allows GPUs based on the NVIDIA Ampere architecture to provide 933GB/s memory bandwidth which is almost three times higher than T4 (320GB/s).

Performance Comparison
The spec comparison of the performance results is as below.

A30 T4
CUDA Cores 3804 2560
Tensor Cores 224 320
Double-Precision (FP64) TFLOPS 5.2 0.25
Tensor Float 32 (TF32) TFLOPS 82/165* N/A
Single-Precision (FP32) TFLOPS 10.3 8.1
Tensor Perf. (Bfloat16) TFLOPS 165/330* N/A
Half-Precision (FP16) TFLOPS 165/330* 65
Integer Operations (INT8) TOPS 330/661* 130
Integer Operations (INT4) TOPS 661/1321* 260
Memory Bandwidth 933GB/s 320GB/s

* With sparsity
AEWIN has verified A30 and T4 on AEWIN platforms including SCB-1932C, SCB-1937C, and BIS-3101. They share similar results with NVidia benchmarks.

Target market: Mainstream Compute/Inference vs ML/DL/Inference
We have seen the comparison between A30 and T4. Considering from the architecture to the performance, they shall be categorized as two levels of graphic cards and the target markets are different. According to NVIDIA’s announcement of the Data Center GPUs, A30 is for mainstream enterprise workloads like AI inference, training, and high-performance computing (HPC) while T4 focuses on edge inference with the advantage of compact size and small power consumption.

As AEWIN platforms also range from edge platforms to general purpose computing systems, to high performance servers, customers can select the most suitable one with the GPUs required for each application. Two of the recommended AEWIN Edge AI models are SCB-1932C and SCB-1937C, they are 2U, 2P servers supporting 2x FHFL GPUs and 4x NIC. To discover more, please don’t hesitate to talk to our friendly sales!

SCB-1932C: 2U Edge Server with dual Intel® 3rd Gen Ice Lake-SP, 2x dual slot Gen 4 x16 FHFL GPU cards, 4x PCIe Gen4 x8 slots for NICs, Accelerators & NVMe SSDs
SCB-1937C: 2U Edge Server with dual AMD EPYCTM 7000 series with 2x dual slot Gen 4 x16 FHFL GPU cards, 4x PCIe Gen4 x8 slots for NICs, Accelerators & NVMe SSDs
BIS-3101: Desktop Workstation with Intel 8th/9th Core i and 1x dual slot Gen 3 x 16 FHFL GPU card