Understanding RDMA: Deploying GPUs with NVIDIA GPUDirect with RDMA in HPEC Systems

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter data to accelerate the creation of images and graphics. Originally developed for rendering graphics in computer games and visual effects, GPUs have evolved to become powerful parallel processors capable of performing a wide range of computationally intensive tasks . The primary function of a GPU is to perform calculations in parallel, breaking down complex tasks into smaller, independent operations that can be executed concurrently. They are particularly effective in handling large datasets and repetitive calculations due to their massive computational power and memory bandwidth.

High-performance embedded computing (HPEC) systems used in remote locations or mobile applications such as unmanned aerial vehicles (UAVs) and electronic countermeasures deal with vast amounts of data from various sources, including sensors, satellites, communications, and intelligence gathering. GPUs are efficient for analyzing large datasets simultaneously, enabling rapid data fusion, pattern recognition, and advanced analytics. GPUs have advanced graphics processing capabilities, making them ideal for visualizing complex data and generating high-quality visual outputs. In other applications where data visualization is crucial for operators and decision-makers, GPUs enable the rendering of 3D models, maps, radar displays, and other graphical representations.

RDMA & How It Works

Remote Direct Memory Access (RDMA) is a technology that allows devices to directly access the memory of remote devices without involving the CPU. When RDMA is used in conjunction with GPUs, it enables efficient data transfer between GPUs and other devices in the HPEC, such as network adapters or storage systems.

How RDMA Works with GPUs:

RDMA-Capable Network Adapters or Interconnects: To enable RDMA with GPUs, the devices involved in the data transfer need to have RDMA-capable network adapters and interconnects. These adapters and interconnects provide the necessary hardware support for RDMA operations. Examples of RDMA-capable interconnects include InfiniBand and Ethernet with RDMA over Converged Ethernet (RoCE).

GPU Memory Registration: Before data transfer can occur, the GPU memory involved in the transfer needs to be registered for RDMA. This process establishes the memory regions that are directly accessible to other devices. The GPU driver typically provides APIs that allow the application or framework to register the GPU memory for RDMA.

Data Transfer Initiation: When transferring GPU data to or from another device, the GPU initiates the data transfer request. This request is typically triggered by the application or framework running on the GPU.

RDMA Operations: Once the data transfer request is initiated, the RDMA-capable network adapter takes over. It performs RDMA operations to directly access the memory of the remote device without CPU intervention.

Completion Notification: Once the RDMA operation is completed, the network adapter notifies the GPU about the completion of the data transfer. This notification allows the GPU to continue with subsequent computations or initiate further data transfers.

By using RDMA, data transfers between GPUs and other devices can bypass the CPU and system memory, resulting in reduced latency and increased bandwidth. It also leaves the CPU free to perform other tasks, which can improve system efficiency. This direct memory access enables efficient and high-speed data movement, enhancing overall system performance in GPU-accelerated applications.


GPUDirect with RDMA is a technology developed by NVIDIA that enables direct memory access between NVIDIA GPUs and other devices, such as network adapters, storage systems, and other GPUs. It allows for efficient data transfers without involving the CPU, resulting in reduced latency and increased bandwidth. Traditionally, data transfer between a GPU and another device involves multiple steps. The data is first copied from the GPU to the CPU’s memory, then sent to the target device, and finally copied from the CPU’s memory on the target device to the destination GPU, storage or application. This process requires making multiple copies of the data, introduces overhead and increases latency, limiting the overall system performance.

GPUDirect with RDMA eliminates these steps by establishing a direct communication path between GPUs and other devices. It enables GPUs to directly access the memory of other GPUs or devices without CPU involvement. By bypassing the CPU and system memory, GPUDirect with RDMA significantly reduces latency and improves bandwidth, resulting in faster data transfers and improved system performance. To leverage GPUDirect with RDMA, both the source and target devices must support RDMA-capable network adapters or interconnects. These adapters should have RDMA capabilities, allowing direct memory access between devices. Additionally, the GPUs involved in the data transfer must support GPUDirect RDMA operations.

Using NVIDIA GPUDirect with RDMA in HPEC Applications

High-performance processing hardware with GPUDirect with RDMA capability can bring several benefits to HPEC applications such as electronic countermeasure systems, radar tracking, signal intelligence (SIGINT), and electro-optical and infrared sensors (EO/IR) systems. NVIDIA GPUDirect with RDMA can be used for completing training and inference tasks in neural networks, data analytics, 3D visualization, and cloud computing.

Benefits of GPUDirect with RDMA

Real-time Data Processing: Sensor systems often deal with high-speed and real-time data streams from various sensors and sources. HPEC hardware with GPUDirect with RDMA enables efficient and accelerated data movement between GPUs and other devices, reducing latency. This capability is essential for real-time data processing and analysis, allowing these systems to quickly detect, classify, and respond to signals or threats.

High Bandwidth and Throughput: In systems where large amounts of data need to be processed and analyzed rapidly, the high bandwidth and throughput provided by GPUDirect with RDMA enable quick data ingestion, processing, and decision-making. It ensures that these systems can keep up with the demands of real-time operations.

Scalability and Flexibility: HPEC systems with GPUDirect with RDMA are designed for scalability, allowing for the integration of multiple GPUs and devices. This scalability is particularly beneficial in systems where expanding sensor capabilities and handling increasing data volumes are common requirements. The combination of HPEC hardware and GPUDirect with RDMA enables these systems to scale up to handle larger workloads and evolving mission demands effectively.

Next-Generation HPEC Hardware Supporting GPUDirect RDMA

EIZO Rugged Solutions is a Preferred member of the NVIDIA Partner Network (NPN), developing cutting-edge GPU solutions based on various NVIDIA architectures. The Condor line of rugged video graphics and GPUs support NVIDIA GPU architectures that support GPUDirect with RDMA. These include:

NVIDIA GPU architectures are packed with thousands of processing cores called CUDA cores which are optimized for parallel processing and operate together to perform tasks simultaneously. They also consist of RT and Tensor Cores, which are two key technologies found in modern GPUs that enhance graphics rendering, computational performance, and AI applications. Additionally, NVIDIA GPUs feature support for PCI Express Gen 4, GDDR5/GDDR6 graphics memory, and dedicated H.265 (HEVC) / H.264 (MPEG4/AVC) encode and decode engines. The Condor video graphics cards are available in various form factors such as OpenVPX (3U/6U), XMC, PCI Express, and offered in a single-board computer configuration. These solutions feature customizable I/O for various interfaces such as 10 Gigabit Ethernet, DisplayPort™, 3G-SDI, DVI, HDMI, VGA, RS-232, and other high-speed connectors. EIZO’s embedded video graphics and GPU solutions are capable of low-latency video capture and processing, image analysis, video encode/decode, and metadata insertion/extraction.

Our latest NVIDIA-based computing solution, the Condor GR2S-A4500-ETH, is designed with the NVIDIA A4500 GPU supporting 16 GB of GDDR6 graphics memory and 5,888 CUDA cores. The A4500 GPU supports PCI Express Gen 4, NVIDIA GPUDirect RDMA, and delivers up to 17.66 TFLOPS of FP32 single-precision floating point performance per slot. The Condor GR2S-A4500-ETH is designed with an Condor GR2S-A4500-ETH, which enables high-speed data transfer and enhanced storage performance with NVIDIA GPUDirect RDMA and RoCE. The ConnectX-7 SmartNIC includes an embedded PCIe switch that allows for better board function and reduces the overall size, weight and power (SWaP) footprint of the card. The card supports 40/100Gb Ethernet DataPlane interface and is compatible with OpenVPX VITA connectors. The Condor GR2S-A4500-ETH has been designed in line with the SOSA technical standard slot profiles 14.6.11 and 14.6.13.

EIZO works directly with customers to design customized video graphics solutions that meet various power requirements and temperature ranges. As a Preferred member of the NPN, we have unique access to NVIDIA’s technology roadmaps and offer the highest levels of technical support to help meet the design, production, and product lifecycle requirements of embedded systems.

Condor GR2S SOSA-Aligned Card
Condor GR2S-A4500-ETH

Joint All-Domain Command & Control (JADC2)

Download our capabilities overview to read how EIZO is uniquely positioned to help implement JADC2 with broad capability for supporting mission-critical data.