AEWIN Edge AI Server Performs GPU Scaling with InfinitiesSoft AI-Stack Introduction As the technologies develop rapidly, AI has been integrated to various kinds of verticals including Smart City, ITS, Smart Manufacturing, Smart Medical, and more. In this blog, we tested a solution integrated AEWIN hardware and our partner InfinitiesSoft AI Stack software to accelerate GPU load balancing and scaling for inferencing. Test Setup AEWIN’s SCB-1932 MEC is a 2U rack-mount hardware networking system. Based on dual 3rd Gen Intel® Xeon® scalable processors, the high-performance platform supports eight channels DDR4 register ECC RDIMM (up to 3200 MHz), and has a maximum memory capacity of up to 1TB per CPU, supporting up to eight Network Expansion Modules or four Network Expansion Modules plus two PCIe x16 full-height, full-length PCIe slots. The maximum capable Ethernet bandwidth is up to 800GbE. A performant platform for the development and orchestration of machine learning is created through the integration of the AEWIN Edge AI Server and the AI-stack of InfinitiesSoft. The Edge AI Appliance provides the ideal development environment to support industry in creating valuable applications. System AEWIN SCB-1932C Processor 2x Intel® Xeon® Gold 5318S CPU @ 2.10GHz DIMM Slot 16x DDR4 32G=512G OS Ubuntu Linux 18.04 (Kernel: 5.4.0-Generic) BIOS C1932A003 BIOS Settings “Above 4G Decoding”: Enable “Above 4GB MMIO BIOS assignment”: Enable PCIe Accelerator 2x NVIDIA T4 Figure 1: Test system set-up Figure 2: GPU load balancing and scaling Figure 3: Run the script to activate the demo Figure 4: Increase the number of GPU Figure 5: Increase more workload against more allocated GPUs. e.g. GPUs to concurrent users ratio at 2 to 1, 2 to 4, 2 to 8, 2 to 10. The results presented in Figure 5 show that AEWIN SCB-1932C Edge AI server with AI Stack, can perform GPU load balancing and scaling for inferencing and deliver linear relationships when we increase the workloads and GPUs. Summary In this test, we demonstrated a solution integrated hardware (AEWIN – Intel Xeon Ice Lake SP Edge AI Server) and software (InfinitiesSoft AI Stack) to accelerate GPU load balancing and scaling for inferencing. The integration of AEWIN Edge Server and AI Stack can optimize the use of GPU resources with better efficiency and make AI development/management easy and quick through the interface platform. The AEWIN Edge AI Server with installed AI Stack platform can handle numerous servers to strengthen the total resources for high-efficient operation and to create win-win solutions for the AI applications.