NVIDIA NVLink
STORAGE
Accessories
7/8/2025
STORAGE
Accessories
What is NVIDIA NVLink
NVIDIA NVLink is a high-speed, direct, bi-directional communication interface between graphics processing units (GPUs) developed by NVIDIA to address the limitations of traditional PCIe. NVLink provides significantly higher bandwidth and lower latency for data transfer between devices, which is especially critical for scalable artificial intelligence (AI), high-performance computing (HPC), and large model processing.
Key facts about NVIDIA NVLink:
- Performance and Scalability: The latest, fifth generation NVLink supports up to 1.8 TB/s of aggregate bandwidth (bidirectional) between GPUs using up to 18 links of 100 GB/s each per Blackwell GPU. This is 2x the bandwidth of the previous generation and more than 14x higher than PCIe Gen5.
- Topology and Architecture: NVLink is a series of direct, multi-line, point-to-point links that enable mesh networking of GPUs with support for cache coherence and homogeneous (unified) memory between devices. This structure eliminates PCIe bottlenecks and enables fast interconnection of large GPU clusters.
- Compatibility and Usage: NVLink is integrated into modern NVIDIA GPUs (e.g., A100, H100/H200, Blackwell), server platforms (e.g., NVIDIA HGX), and is tightly coupled with high-speed NVSwitch switches that extend NVLink networking between GPUs from the confines of a server cabinet to rows of racks.
- Applications: NVLink dramatically accelerates the training and inference of large language models, scientific simulations, graphics tasks, and any computationally intensive operation that requires fast data transfer and memory sharing between multiple GPUs.
Technical Features:
- Each NVLink version increases line speeds and decreases the number of lines to improve efficiency (e.g., NVLink 3.0 - up to 50 Gbps per line).
- Supports up to 18 physical lines per GPU, combining bandwidth up to 900 GB/s (for Hopper) and 1.8 TB/s (for Blackwell).
- Historical context: NVLink technology was announced in 2014 to overcome PCIe bottlenecks and first integrated into Pascal architecture GPUs (2016) with IBM for servers with POWER CPUs.