2021.02.03

Bfloat16 – 簡介

分享：

AI 計算在計算上是昂貴的，特別是在處理 FP32 的較大數字集時。半精度 FP16 浮點數只有 8 位精度的有效數字和 5 位的指數，與 FP32 相比，其精度和可表示的數字範圍都較低。

BFloat16 是由 Google 開發的，專門用來滿足 AI 的特殊需求，AI 需要大量數字的範圍，而對有效數字的精度要求較低。BFloat16 本質上是一個截斷有效數字的 FP32，帶來 FP16 的性能和 FP32 的動態範圍，同時使用一半的內存。使用 BF16 的好處是可以輕鬆將現有的 FP32 數據轉換為 BF16，以便進一步的神經處理。BF16 提供了足夠的精度，沒有更多。這是完成工作的合適工具。

bfloat16-2-2

BF16 bfloat16 :
1位元用於符號
8位元用於指數
2^-127最小正值為2¹²⁸最大正值（指數偏移量為127）。
7bit 用於有效數字

我們正處於人工智慧時代的邊緣。擁有一種針對人工智慧的通用且快速的數字格式實現將加速我們朝著正確的方向前進。業界已經支持BFloat16格式，並在廣泛的硬體平台上提供支持。格式的發明者Google在雲端提供了他們自己的TPU（張量處理單元）。人工智慧加速器的事實領導者Nvidia也接受了bf16，並將其實施到他們最新的基於Ampere的矽晶片中的張量核心。CPU巨頭Intel在Nervana加速器中提供了專門的解決方案，並將BF16整合到他們的AVX-512擴展中，以便在對人工智慧的依賴較少的特定使用情況下，減少的工作負載可以由CPU本身處理。ARM也將bf16整合到他們的SVE和Neon指令中。這一點非常重要，因為ARM v8在從移動設備到基礎設施的廣泛平台上都有使用。

表1 – 支援BFloat16的選定硬體列表

CPU	Support
第一代和第二代英特爾® Xeon® 可擴展處理器	no
第三代英特爾® 至強® 可擴展處理器（Cooper Lake）	yes
GPU	支持
Nvidia Volta (V100)	No
Nvidia Turing (T4)	No
Nvidia Ampere(A100)	yes
AMD Radeon RX6000	No
AMD Radeon Instinct	yes

相關訊息

2026.06.30

Rack-Scale AI Infrastructure: Maximizing Performance, Efficiency, and Scalability for the AI Era

Driven by the explosion of Gen AI, Agentic AI, and the massive datasets behind them, computing infrastructure is evolving from standalone servers to rack-scale architectures. Modern AI workloads require a tightly integrated combination of computing, networking, storage, and cooling solutions to deliver maximum performance and efficiency. Future-Ready AI Infrastructure has become the foundation for the AI Era.

2026.06.30

Enhancing Network Resilience with AEWIN Gen4 LAN Bypass

Traditional LAN bypass focuses on keeping traffic flowing when a system goes down, but modern deployments require greater flexibility to balance availability and security. AEWIN Gen4 LAN bypass builds on the Gen3 foundation by introducing enhanced traffic control mechanisms to enable network behavior to better align with real-world operational demands.

2026.06.30

Optimizing Thermal Design for High-Performance Network Appliances and Servers

As modern data centers and network infrastructures continue to scale, the demand for higher computing performance is rapidly increasing. This trend drives CPU power consumption to new levels, especially with the latest server-grade processors. As a result, optimized thermal management has become a critical design factor that directly impacts system stability and performance. High-performance network appliances and servers require advanced cooling solutions to sustain performance under heavy workloads.