Ampere Architecture: Next-Generation GPU Accelerated Computing for Embedded Systems

Ampere Architecture – GPU Enhancements for Mission-Critical Applications

Autonomous vehicles such as UAVs must have the highest efficiency computing systems to ensure accurate vision and perception in SWaP(Size, Weight, and Power) restricted environments while not sacrificing performance. High-Performance Computing (HPC) systems running at the edge require the utmost powerful GPGPU processing hardware per watt to assure advanced AI and machine learning algorithms can execute on the expected workflows.

Most importantly, lives depend on the systems used in mission-critical environments. The computer systems completing the sensor analysis and mathematical calculations require extreme accuracy to ensure missions are safe and successful. AI and Machine Learning (ML) runtimes for radar tracking systems, weapon systems, and ISR operations need optimized hardware to assure that precision can also be executed within a timely manner.

NVIDIA’s Ampere brings many benefits to embedded systems as it provides performance boosts in both traditional render applications as well as GPGPU operations like target detection using both traditional and AI/ML methods. With increased support for “AI at the Edge”, the technology brings new levels of computing power to data-driven applications.

The Ampere architecture supports the newest high-speed graphics memory, GDDR6 with Error Correction Code (ECC). The inclusion of ECC memory provides assurances to data integrity and reliability, which especially benefits SIG-INT, Electronic Warfare, and other digital signal processing applications.

The new third-generation Tensor Cores expand on deep learning matrix arithmetic to influence neural network training and faster AI-inferencing to allow for even the most complex models to be inferenced at the edge. The baseline performance of these new cores provide 2-3x improvement over the previous Turing generation Tensor Cores.  Additionally, the Ampere Tensor Cores support the new hybrid Tensor Float 32 (TF32) data type, unlocking the numerical range of FP32 while keeping the efficiency of FP16.  This new data type along with the base improvements and new sparsity feature can provide upwards of a 10-20x the AI/ML model processing throughput compared to previous generation tensor cores, all with no changes to the underlying software.

With up to 2X the throughput over the previous generation and the ability to concurrently run ray tracing with either shading or denoising capabilities, the second-generation RT Cores deliver massive speedups for real-time rendering. This technology also speeds up the rendering of ray-traced motion blur for faster results with greater visual accuracy which could have large impacts in the simulation and radar processing fields.

With generational upgrades in both RT Cores and Tensor Cores, the Ampere GPUs are an ideal solution for computationally intensive AI accelerated applications such as raw video rendering and streaming, image analysis, object tracking, and motion detection. This is all in addition to the massive 2-3x improvement per watt to the base streaming multiprocessor allowing the Ampere architecture’s CUDA® cores to be twice as power efficient compared to previous generation Turing GPUs.

PCIe Gen 4 – Double the Bandwidth

The Ampere family of GPUs are the first to support PCI Express Gen 4, which delivers double the bandwidth available from PCIe 3.0 busses. PCIe Gen 4 supports up to 16 Gigatransfers/second bit rate, with a x16 PCIe 4.0 slot providing up to 32 GB/sec of peak bandwidth. This extra bandwidth allows for customers to take full advantage of the latest generation architecture processing speeds.

PCIe Gen 4 improves data-transfer speeds from CPU systems supporting Gen 4 assuring the transfer bus isn’t throttling data-intensive tasks like graphics rendering, AI/ML, data science, and other image/sensor analysis. RDMA (Remote Direct Memory Access) using NVIDIA GPUDirect® further alleviates bottlenecks induced by CPU Memory inefficiencies by allowing other RDMA enabled devices to transfer data directly to GPU memory, fully unlocking the potential of the PCI-e Gen 4 bus. All this delivers a powerful video graphics and AI Processor solution for mission-critical applications with both the end processor and bus being able to handle the large amount of data now being generated by hardware at the edge.

The inclusion for PCIe Gen 4 is even more important to embedded systems within the defense market as the newly released Revision 1.0 of the Sensor Open Systems Architecture (SOSA) technical standard introduces new profiles requiring Gen 4 PCIe for SOSA-aligned HPEC systems.

OpenVPX Hardware with Ampere GPU

Condor GR5-A2000 3U VPX

EIZO is the first to market with a rugged Ampere-based OpenVPX 3U form factor HPC graphics and GPGPU card. The Condor GR5-A2000 is an embedded 3U VPX HPEC graphics and AI-enabled GPGPU card that is built with the power-efficient NVIDIA Ampere RTX A2000 GPU. The A2000 embedded GPU variant hosts 2560 NVIDIA CUDA Cores, 20 RT cores, and 80 Tensor Cores, along with the industry’s best H.265 (HEVC) / H.264 (MPEG4/AVC) encode & decode engines.

The Condor GR5-A2000 has support for traditional DisplayPort and Single-Link DVI outputs, but additionally supports 2x 3G-SDI displays for integration in the most rugged environments. The GR5-A2000 is it the first GPU to the market supporting PCI-e Gen4 to unlock the true potential of newer generation CPU and payload cards while maintaining a SWaP optimized max power footprint of 70W and configurability to reduce clocking down to enable lower power footprints.


Explore our OpenVPX Products!