This section describes Nvidia GPU support on Kommander. DKP supported Nvidia driver versions is 470.x. Other GPUs, such as those made by AMD, are not currently supported. This document assumes familiarity with Kubernetes GPU support. More information about GPUs in AWS environment can be found in the Advanced AWS section.
Kommander GPU Overview
GPU support on Kommander uses the Nvidia container runtime. With the Nvidia GPU turned on, Kommander configures the container runtime to run GPU containers, and installs all the necessary items to power up the Nvidia GPU devices.
The following components provide Nvidia GPU support on Kommander:
nvidia-container-runtime: GPU Support in Kommander depends on the containerd runtime.
runc, simplifying the container runtime integration with the GPU.
- Nvidia Device Plugin: Kommander makes use of Nvidia GPUs using this Kubernetes device plugin. It allows GPU enabled containers to run on Kubernetes, tracking the number of available GPUs on each node and their health.
- Nvidia Data Center GPU Manager: Contains a Prometheus exporter that provides Nvidia GPU metrics.
Kommander runs these components as daemonsets, making them easier to manage and upgrade across all GPU nodes.
The following procedures are described: