2022.10.18

Nvidia GPU A30 與 T4 的前一代比較

分享：

Nvidia剛剛在這個秋季發布了最新的GPU，採用了Hopper和Ada Lovelace的新架構。在我們深入了解最新技術的詳細信息之前，讓我們回顧一下Nvidia之前微架構的足跡。

在2018年，NVIDIA推出了一款能源效率高的T4，具有革命性的多精度性能，能為邊緣推理提供卓越的性能。三年後，A30作為最通用的主流計算GPU之一進入市場，專為AI推理而設。讓我們來看看它們之間的區別。
這是一個簡短的比較表，包含一些規格，但不包括性能結果。

	A30	T4
製程技術	7/8nm	14nm
PCIe	Gen4 x16	Gen3 x16
TDP（最大熱設計功率）	165W	70W
Size (mm)	268 x 111 (FHFL)	170 x 69 (HHHL/低剖面)
插槽寬度	Double	Single

隨著製程技術和 PCIe 速度的提升，A30 的性能預期將有大幅度的躍進，這與實際結果相符。隨著更大的 TDP 和更大的尺寸，支援這些 GPU 的平台也應隨之演進，最新的 AEWIN 平台支援 A30 和 T4，已在這篇技術部落格的最後列出以供參考。
除了提到的技術增長外，NVIDIA 在 A30 中還有一些創新，包括架構張量核心技術。
架構張量核心技術：安培 vs 圖靈

Extra Enhancement	Ampere	調整
多精度計算	FP64, TF32, FP32, BF16, FP16, INT8, INT4	FP32, FP16, INT8, INT4

T4 採用 NVIDIA Turing Tensor Cores 提供革命性的多精度性能（FP32、FP16、INT8 和 INT4），以加速各種現代應用，包括機器學習、深度學習和虛擬桌面。除了之前的精度外，A30 採用 NVIDIA Ampere Tensor Cores 技術，支持包括 Tensor Float 32 (TF32)、BFloat16 (BF16) 和更高性能的雙精度 FP64。在 TF32 和 BF16 之外，A30 還配備了新的多實例 GPU (MIG)，讓我們更仔細地看看它們。

Tensor Float 32
TF32 是處理 AI/HPC 應用的矩陣數學的數學模式。如以下插圖所示，TF32 使用與 FP16 數學相同的 10 位尾數，並採用與 FP32 相同的 8 位指數，以支持更大的數字範圍和足夠的精度要求的邊際。關於 TF32 深度學習性能，A30 在不進行任何代碼更改的情況下，提供高達 10 倍的性能優於 NVIDIA T4。

a30T4-02-2-1024x615

BFloat16
至於 BF16，正如我們在之前的技術部落格中提到的，它本質上是一種 FP32，具有截斷的有效數字，帶來了 FP16 的性能和 FP32 的動態範圍，同時使用一半的內存。隨著內存帶寬的減少，允許更快的執行。

多實例GPU
Multi-Instance GPU (MIG) 的新功能使基於 NVIDIA Ampere 架構的 GPU 能夠提供 933GB/s 的記憶體頻寬，這幾乎是 T4 (320GB/s) 的三倍。

性能比較
性能結果的規格比較如下。

	A30	T4
CUDA 核心	3804	2560
張量核心	224	320
雙精度 (FP64) TFLOPS	5.2	0.25
Tensor Float 32 (TF32) TFLOPS	82/165*	N/A
單精度 (FP32) TFLOPS	10.3	8.1
Tensor Perf. (Bfloat16) TFLOPS	165/330*	N/A
半精度 (FP16) TFLOPS	165/330*	65
整數運算 (INT8) TOPS	330/661*	130
整數運算 (INT4) TOPS	661/1321*	260
記憶體頻寬	933GB/s	320GB/s

* 具稀疏性
AEWIN已在包括SCB-1932C、SCB-1937C和BIS-3101在內的AEWIN平台上驗證了A30和T4。它們與NVidia基準測試的結果相似。

目標市場：主流計算/推斷 vs 機器學習/深度學習/推斷
我們已經看到了 A30 和 T4 之間的比較。從架構到性能來看，它們應該被歸類為兩個級別的顯示卡，目標市場也不同。根據 NVIDIA 對數據中心 GPU 的公告，A30 主要用於主流企業工作負載，如 AI 推理、訓練和高性能計算 (HPC)，而 T4 則專注於邊緣推理，具有體積小和功耗低的優勢。

由於 AEWIN 平台涵蓋了從邊緣平台到通用計算系統，再到高性能伺服器，客戶可以選擇最適合的平臺，並配備每個應用所需的 GPU。兩款推薦的 AEWIN 邊緣 AI 型號是 SCB-1932C 和 SCB-1937C，它們是 2U、2P 伺服器，支持 2x FHFL GPU 和 4x NIC。如需了解更多，請隨時與我們友好的銷售團隊聯繫！

SCB-1932C：2U邊緣伺服器，搭載雙Intel® 第三代Ice Lake-SP，2個雙插槽Gen 4 x16 FHFL GPU卡，4個PCIe Gen4 x8插槽，用於NIC、加速器和NVMe SSD
SCB-1937C：2U邊緣伺服器，配備雙AMD EPYCTM 7000系列，具有2個雙插槽Gen 4 x16 FHFL GPU卡，4個PCIe Gen4 x8插槽，用於NIC、加速器和NVMe SSD。
BIS-3101：桌上型工作站，配備 Intel 第 8/9 代 Core i 和 1 個雙槽 Gen 3 x 16 FHFL GPU 顯示卡

相關訊息

2026.06.30

Rack-Scale AI Infrastructure: Maximizing Performance, Efficiency, and Scalability for the AI Era

Driven by the explosion of Gen AI, Agentic AI, and the massive datasets behind them, computing infrastructure is evolving from standalone servers to rack-scale architectures. Modern AI workloads require a tightly integrated combination of computing, networking, storage, and cooling solutions to deliver maximum performance and efficiency. Future-Ready AI Infrastructure has become the foundation for the AI Era.

2026.06.30

Enhancing Network Resilience with AEWIN Gen4 LAN Bypass

Traditional LAN bypass focuses on keeping traffic flowing when a system goes down, but modern deployments require greater flexibility to balance availability and security. AEWIN Gen4 LAN bypass builds on the Gen3 foundation by introducing enhanced traffic control mechanisms to enable network behavior to better align with real-world operational demands.

2026.06.30

Optimizing Thermal Design for High-Performance Network Appliances and Servers

As modern data centers and network infrastructures continue to scale, the demand for higher computing performance is rapidly increasing. This trend drives CPU power consumption to new levels, especially with the latest server-grade processors. As a result, optimized thermal management has become a critical design factor that directly impacts system stability and performance. High-performance network appliances and servers require advanced cooling solutions to sustain performance under heavy workloads.