logo

Multi-Instance GPU (MIG)

STORAGE

Accessories

7/8/2025

STORAGE

Accessories

5,0

What is Multi-Instance GPU

Multi-Instance GPU (MIG) is an NVIDIA technology first introduced with the Ampere architecture and available in subsequent generations of GPUs (including Hopper and Blackwell). It allows a single physical GPU to be partitioned into up to seven fully isolated and independent virtual GPU Instances, each with its own dedicated memory, processing unit cores, and cache memory. This optimizes costs and provides quality and predictable service for different users and tasks in scalable computing environments.

Key features and benefits of MIG:

  • Resource Isolation: Each instance has its own dedicated resources - compute cores and GPU memory. This provides predictable performance and protection from the influence of neighboring instances, which is especially important for multi-tenant environments such as clouds.
  • Optimize GPU utilization: Workloads that do not use the full potential of a single GPU can run in parallel on different instances, improving overall GPU utilization and reducing latency for users. For example, you can create instances with different amounts of allocated memory, ranging from 10 GB or more, for different workloads.
  • Virtualization and containerization support: MIG is compatible with Linux, Docker, Kubernetes, and is also supported by hypervisors (Red Hat Virtualization, VMware vSphere) for use in virtual machines and containers. It can be used in bare-metal, GPU pass-through and vGPU configurations.
  • Quality of Service (QoS) Assurance: Each instance has dedicated memory bandwidth and compute resources, which helps ensure predictable response times for given applications and prevents heavy loads from affecting the performance of others.
  • Limitations: MIG does not support CUDA Inter-Process Communication (IPC), due to which optimal performance on multiple instances of the same application is limited.
  • The use of graphics APIs (OpenGL, Vulkan, etc.) is not supported, and tasks requiring a very high number of CPU cores may require a full GPU without partitioning.
  • Management: MIG instances are managed via the NVIDIA Management Library (NVML) and the command line (nvidia-smi). Enabling MIG requires a reboot of the GPU, and multiple GPUs require stopping GPU management services before enabling.

Example uses:

  • Running multiple AI inference tasks simultaneously on a single GPU.
  • Support for multi-tenant and multi-container systems in data centers.
  • Efficiently deploy small to medium-sized AI and HPC tasks without having to dedicate the entire GPU.

Rate this article