2023.10.26

GPU Benchmark Tests of Genoa, Milan and Ice Lake Platforms

Share:

In our previous blog, we announced AEWIN SCB-1932C Server has been validated as a NVIDIA-Certified System for enterprise edge. Today we will explore more on the GPU benchmark tests across different AEWIN platforms.

System Configurations
Applying AEWIN High Performance Appliances, SCB-1946C, SCB-1932C, and SCB-1937C.

Servers for Testing/Benchmark
System SCB-1946C SCB-1932C SCB-1937C Nvidia Benchmark
Processor Dual AMD EPYC 9174F

 

(Genoa)

Dual Intel Xeon Gold 5318S

 

(Ice Lake)

Dual AMD EPYC 7543

 

(Milan)

Dual AMD EPYC 7003

 

(Milan)

Core 16 24 32 N/A
Freq 4.1 GHz 2.1 GHz 2.8 GHz N/A
Memory 1x 32GB 2x 32GB 1x 32GB N/A
GPU 1x Nvidia A30 1x Nvidia A30 1x Nvidia A30 1x Nvidia A30
OS Ubuntu 20.04.3 LTS Ubuntu 20.04.3 LTS Ubuntu 20.04.3 LTS N/A
Framework TensorRT 8.6.1 TensorRT 8.6.1 TensorRT 8.6.1 TensorRT 8.6.1

GPU Status Monitor
For preparation, write a GPU monitor script “monitor.sh” in the host in case of throttling.

ALL_news_tech_blog_26A12_yLpOBqs2TA

Input the status refresh duration as interval. input “y” to save log or “n” not to save log.

ALL_news_tech_blog_26A12_xh7TfBUz6o

Benchmark Test
Run the script “benchmark.sh” from the host. It will redirect you to the GPU accelerated container. From the container run the script “benchmark.sh”. It will ask to choose between int8 mode or fp16 mode for the test. Input 1 to run in int8 mode.

Run the script “benchmark.sh” in the host to start the test. The picture below shows an example of the benchmark results.

ALL_news_tech_blog_26A12_7ATc9RXPUZ

For the Benchmark results, we only consider the percentile value of the GPU Compute. For example, the percentile value shown in the above figure is equal to 8.88623. To calculate the performance in img/sec for any GPU, we use the following formula: 1000/(percentile/128), where 128 is batch size for current test. Thus, the int8 (images/sec) is equivalent to 14,405.

Testing Script
1. sh script in the container

#!/bin/bash
echo -e “for int8 test, press 1; for fp16 test, press 2 : ”
read testmode
if [ “${testmode}” -eq 1 ]; then
    /workspace/tensorrt/bin/trtexec –batch=128 –iterations=400 –workspace=1024 –percentile=99 –deploy=ResNet50_N2.prototxt –model=ResNet50_fp32.caffemodel –output=prob –int8
elif [ “${testmode}” -eq 2 ]; then
    /workspace/tensorrt/bin/trtexec –batch=128 –iterations=400 –workspace=1024 –percentile=99 –deploy=ResNet50_N2.prototxt –model=ResNet50_fp32.caffemodel –output=prob –fp16
else
    echo -e “input wrong !!!”
fi

2. sh script in the host

#!/bin/bash
docker run –gpus ‘”device=0″‘ -it –rm –name trt_2011 -w /workspace/tensorrt/data/resnet50/ trt:2011

3. burn-in script burn.sh in the container

#!/bin/bash
for((i=1;i>0;i++))
do
    mpirun –allow-run-as-root -np 1 –mca btl ^openib python -u ./resnet.py –batch_size 128 –num_iter 28800 –precision fp16 –iter_unit batch
done

4. burn-in script burn.sh in the host

#!/bin/bash
docker run –gpus ‘”device=0″‘ -it –rm –name tf_2011tf2 -w /workspace/nvidia-examples/cnn tf:2011tf2

5. GPU monitor script “monitor.sh” in the host

#!/bin/bash
#echo ” ” > ./gpu_log.txt
echo “please enter interval (sec) : ”
read interval
echo “Do you want to save the log file?(y/n)”
read logflagfor((i=1;i>0;i++))
do
    if [ “${logflag}” = “y” ]; then
        echo -e “\n=====i : ${i}=====\n” > ./gpu_log_tmp.txt
        nvidia-smi >> ./gpu_log_tmp.txt
        sleep 1
        nvidia-smi -q -d CLOCK | grep -v N/A | grep -v “Not Found” >> ./gpu_log_tmp.txt
        cat ./gpu_log_tmp.txt
        cat ./gpu_log_tmp.txt >> gpu_log.txt
        sleep “${interval}”
    elif [ “${logflag}” = “n” ]; then
        echo -e “\n=====i : ${i}=====\n”
        nvidia-smi
        sleep 1
        nvidia-smi -q -d CLOCK | grep -v N/A | grep -v “Not Found”
        sleep “${interval}”
    else
        echo “Input error! Please enter y or n.”
    fi
done

Summary
As shown in the benchmark results, we verified A30 on the platforms including SCB-1946C(Genoa), SCB-1932C(Ice Lake), and SCB-1937C(Milan). They share better or similar results compared to Nvidia benchmarks.

ALL_news_tech_blog_26A12_CsP1SAv3N6

platforms range from edge AI appliances to general purpose computing systems, to high performance servers, customers can select the most suitable ones with the GPUs required for each application. Reach out to our friendly sales and discover more on AEWIN GPU Server platforms!

  • SCB-1932: 2U Dual Ice Lake-SP PCIe 4.0 Platform with short depth design, 4x PCIe Gen4 slots plus dual FHFL GPU slots or 4x PCIe Gen4 NIC, and IPMI.
  • SCB-1933: 2U Ice Lake-SP PCIe 4.0 Platform with short depth design, 4x PCIe Gen4 slots plus dual FHFL GPU slots or 4x PCIe Gen4 NIC, and IPMI.
  • SCB-1942: 2U Dual Sapphire Rapids-SP PCIe 5.0/CXL Platform with short depth design, 4x PCIe Gen5 slots plus dual FHFL GPU slots or 4x PCIe Gen4 NIC, and IPMI.
  • SCB-1943: 2U Sapphire Rapids-SP PCIe 5.0/CXL Platform with short depth design, 4x PCIe Gen5 slots plus dual FHFL GPU slots or 4x PCIe Gen5 NIC, and IPMI.
  • SCB-1946: 2U Dual EPYC-9004 (Genoa/Bergamo) PCIe 5.0/CXL Platform with short depth design, 4x PCIe Gen5 slots plus dual FHFL GPU slots or 4x PCIe Gen4 NIC, and IPMI.
  • SCB-1947: 2U EPYC-8004 (Siena) PCIe 5.0/CXL Platform with short depth design, 8x PCIe Gen5 slots NIC, NVMe, and IPMI.
  • BAS-6101A: 2U High-Density Edge Computing Server with AMD Bergamo/Genoa/Genoa-X processor, total 8x PCIe slots (2x dual width FHFL PCIe Gen5 x16 or 4x single width FHFL PCIe Gen5 x16, 2x single width FHHL PCIe Gen5 x16, 2x HHHL PCIe Gen4 x8) + 1x OCP 3.0 slot for NICs and Accelerators.
  • BAS-6101B: 2U High-Performance Server with AMD Bergamo/Genoa/Genoa-X processor, total 8x PCIe slots (2x dual width FHFL PCIe Gen5 x16 or 4x single width FHFL PCIe Gen5 x16, 2x single width FHHL PCIe Gen5 x16, 2x HHHL PCIe Gen4 x8) for NICs and Accelerators.

Related News

Rack-Scale AI Infrastructure: Maximizing Performance, Efficiency, and Scalability for the AI Era
2026.06.30

Rack-Scale AI Infrastructure: Maximizing Performance, Efficiency, and Scalability for the AI Era

Driven by the explosion of Gen AI, Agentic AI, and the massive datasets behind them, computing infrastructure is evolving from standalone servers to rack-scale architectures. Modern AI workloads require a tightly integrated combination of computing, networking, storage, and cooling solutions to deliver maximum performance and efficiency. Future-Ready AI Infrastructure has become the foundation for the AI Era.

Enhancing Network Resilience with AEWIN Gen4 LAN Bypass
2026.06.30

Enhancing Network Resilience with AEWIN Gen4 LAN Bypass

Traditional LAN bypass focuses on keeping traffic flowing when a system goes down, but modern deployments require greater flexibility to balance availability and security. AEWIN Gen4 LAN bypass builds on the Gen3 foundation by introducing enhanced traffic control mechanisms to enable network behavior to better align with real-world operational demands.

Optimizing Thermal Design for High-Performance Network Appliances and Servers
2026.06.30

Optimizing Thermal Design for High-Performance Network Appliances and Servers

As modern data centers and network infrastructures continue to scale, the demand for higher computing performance is rapidly increasing. This trend drives CPU power consumption to new levels, especially with the latest server-grade processors. As a result, optimized thermal management has become a critical design factor that directly impacts system stability and performance. High-performance network appliances and servers require advanced cooling solutions to sustain performance under heavy workloads.

Inquiry Cart

total 0 items

Compare

total 0 items

Email Subscribe

Verification

Click the numbers from smallest to largest.

We use cookies to allow our website to work properly, personalize content and advertising, provide social media features and analyze traffic. We also share information about your use of our site with our social media, advertising and analytics partners

Manage Cookies

Privacy Settings

We use cookies to allow our website to work properly, personalize content and advertising, provide social media features and analyze traffic. We also share information about your use of our site with our social media, advertising and analytics partners

Privacy Policy

Manage Consent Settings

Essential Cookies

Accept All

The website cannot function without these cookies and you cannot switch them off on your system.

These cookies are typically set only in response to an action you perform (i.e. a service request), such as setting privacy preferences, logging in, or filling in a form.

You can set your browser to block or prompt you for these cookies, but this may prevent some site features from working.

Marketing Cookies

Marketing cookies are used to track visitors' journey through our website. The purpose is to display advertisements that are relevant or appealing to the individual user and are therefore more important to the publisher or third-party advertiser.

Targeting Cookies
These cookies are set through our site by advertising partners. These companies may use cookies to build a profile of your interests and show you relevant adverts on other sites. They only need to recognise your browser and device to work. If you do not allow these cookies, you will not experience targeted advertising across different websites.

Social Media Cookies
These cookies are set by a range of social media services that we have added to our site to enable you to share our content with your friends and networks. They can track your browser across other websites and build a profile of your interests. This may affect the content and messages you view when you visit other websites. If you do not allow these cookies, you may not be able to use or view these sharing tools.