2022.10.18

Nvidia GPU A30 与 T4 的前一代比较

分享：

Nvidia刚刚在这个秋季发布了最新的GPU，採用了Hopper和Ada Lovelace的新架构。在我们深入了解最新技术的详细信息之前，让我们回顾一下Nvidia之前微架构的足迹。

在2018年，NVIDIA推出了一款能源效率高的T4，具有革命性的多精度性能，能为边缘推理提供卓越的性能。三年后，A30作为最通用的主流计算GPU之一进入市场，专为AI推理而设。让我们来看看它们之间的区别。
这是一个简短的比较表，包含一些规格，但不包括性能结果。

	A30	T4
制程技术	7/8nm	14nm
PCIe	Gen4 x16	Gen3 x16
TDP（最大热设计功率）	165W	70W
Size (mm)	268 x 111 (FHFL)	170 x 69 (HHHL/低剖面)
插槽宽度	Double	Single

随着制程技术和 PCIe 速度的提升，A30 的性能预期将有大幅度的跃进，这与实际结果相符。随着更大的 TDP 和更大的尺寸，支援这些 GPU 的平台也应随之演进，最新的 AEWIN 平台支援 A30 和 T4，已在这篇技术部落格的最后列出以供参考。
除了提到的技术增长外，NVIDIA 在 A30 中还有一些创新，包括架构张量核心技术。
架构张量核心技术：安培 vs 图灵

Extra Enhancement	Ampere	调整
多精度计算	FP64, TF32, FP32, BF16, FP16, INT8, INT4	FP32, FP16, INT8, INT4

T4 採用 NVIDIA Turing Tensor Cores 提供革命性的多精度性能（FP32、FP16、INT8 和 INT4），以加速各种现代应用，包括机器学习、深度学习和虚拟桌面。除了之前的精度外，A30 採用 NVIDIA Ampere Tensor Cores 技术，支持包括 Tensor Float 32 (TF32)、BFloat16 (BF16) 和更高性能的双精度 FP64。在 TF32 和 BF16 之外，A30 还配备了新的多实例 GPU (MIG)，让我们更仔细地看看它们。

Tensor Float 32
TF32 是处理 AI/HPC 应用的矩阵数学的数学模式。如以下插图所示，TF32 使用与 FP16 数学相同的 10 位尾数，并採用与 FP32 相同的 8 位指数，以支持更大的数字范围和足够的精度要求的边际。关于 TF32 深度学习性能，A30 在不进行任何代码更改的情况下，提供高达 10 倍的性能优于 NVIDIA T4。

a30T4-02-2-1024x615

BFloat16
至于 BF16，正如我们在之前的技术部落格中提到的，它本质上是一种 FP32，具有截断的有效数字，带来了 FP16 的性能和 FP32 的动态范围，同时使用一半的内存。随着内存带宽的减少，允许更快的执行。

多实例GPU
Multi-Instance GPU (MIG) 的新功能使基于 NVIDIA Ampere 架构的 GPU 能够提供 933GB/s 的记忆体频宽，这几乎是 T4 (320GB/s) 的三倍。

性能比较
性能结果的规格比较如下。

	A30	T4
CUDA 核心	3804	2560
张量核心	224	320
双精度 (FP64) TFLOPS	5.2	0.25
Tensor Float 32 (TF32) TFLOPS	82/165*	N/A
单精度 (FP32) TFLOPS	10.3	8.1
Tensor Perf. (Bfloat16) TFLOPS	165/330*	N/A
半精度 (FP16) TFLOPS	165/330*	65
整数运算 (INT8) TOPS	330/661*	130
整数运算 (INT4) TOPS	661/1321*	260
记忆体频宽	933GB/s	320GB/s

* 具稀疏性
AEWIN已在包括SCB-1932C、SCB-1937C和BIS-3101在内的AEWIN平台上验证了A30和T4。它们与NVidia基准测试的结果相似。

目标市场：主流计算/推断 vs 机器学习/深度学习/推断
我们已经看到了 A30 和 T4 之间的比较。从架构到性能来看，它们应该被归类为两个级别的显示卡，目标市场也不同。根据 NVIDIA 对数据中心 GPU 的公告，A30 主要用于主流企业工作负载，如 AI 推理、训练和高性能计算 (HPC)，而 T4 则专注于边缘推理，具有体积小和功耗低的优势。

由于 AEWIN 平台涵盖了从边缘平台到通用计算系统，再到高性能伺服器，客户可以选择最适合的平臺，并配备每个应用所需的 GPU。两款推荐的 AEWIN 边缘 AI 型号是 SCB-1932C 和 SCB-1937C，它们是 2U、2P 伺服器，支持 2x FHFL GPU 和 4x NIC。如需了解更多，请随时与我们友好的销售团队联繫！

SCB-1932C：2U边缘伺服器，搭载双Intel® 第三代Ice Lake-SP，2个双插槽Gen 4 x16 FHFL GPU卡，4个PCIe Gen4 x8插槽，用于NIC、加速器和NVMe SSD
SCB-1937C：2U边缘伺服器，配备双AMD EPYCTM 7000系列，具有2个双插槽Gen 4 x16 FHFL GPU卡，4个PCIe Gen4 x8插槽，用于NIC、加速器和NVMe SSD。
BIS-3101：桌上型工作站，配备 Intel 第 8/9 代 Core i 和 1 个双槽 Gen 3 x 16 FHFL GPU 显示卡

Rack-Scale AI Infrastructure: Maximizing Performance, Efficiency, and Scalability for the AI Era

Driven by the explosion of Gen AI, Agentic AI, and the massive datasets behind them, computing infrastructure is evolving from standalone servers to rack-scale architectures. Modern AI workloads require a tightly integrated combination of computing, networking, storage, and cooling solutions to deliver maximum performance and efficiency. Future-Ready AI Infrastructure has become the foundation for the AI Era.

2026.06.30

Enhancing Network Resilience with AEWIN Gen4 LAN Bypass

Traditional LAN bypass focuses on keeping traffic flowing when a system goes down, but modern deployments require greater flexibility to balance availability and security. AEWIN Gen4 LAN bypass builds on the Gen3 foundation by introducing enhanced traffic control mechanisms to enable network behavior to better align with real-world operational demands.

2026.06.30

Optimizing Thermal Design for High-Performance Network Appliances and Servers

As modern data centers and network infrastructures continue to scale, the demand for higher computing performance is rapidly increasing. This trend drives CPU power consumption to new levels, especially with the latest server-grade processors. As a result, optimized thermal management has become a critical design factor that directly impacts system stability and performance. High-performance network appliances and servers require advanced cooling solutions to sustain performance under heavy workloads.

Nvidia GPU A30 与 T4 的前一代比较

相关讯息

Rack-Scale AI Infrastructure: Maximizing Performance, Efficiency, and Scalability for the AI Era

Enhancing Network Resilience with AEWIN Gen4 LAN Bypass

Optimizing Thermal Design for High-Performance Network Appliances and Servers