NVIDIA Platform Application Management Cluster
Instructions on enabling the NVIDIA platform application on a Management cluster
Enable NVIDIA Platform Application on Kommander for Management Cluster
If you intend to run applications that make use of GPU’s on your cluster, you should install the NVIDIA GPU operator. To enable NVIDIA GPU support when installing Kommander on a management cluster, perform the following steps:
Create an installation configuration file:
CODEdkp install kommander --init > install.yaml
Append the following to the apps section in the
install.yaml
file to enable Nvidia platform services.CODEapps: nvidia-gpu-operator: enabled: true
Install Kommander using the configuration file you created:
CODEdkp install kommander --installer-config ./install.yaml --kubeconfig=${CLUSTER_NAME}.conf
In the previous command, the
--kubeconfig=${CLUSTER_NAME}.conf
flag ensures that you set the context to install Kommander on the right cluster. For alternatives and recommendations around setting your context, refer to Provide Context for Commands with a kubeconfig File.Proceed to the Select the correct Toolkit version for your NVIDIA GPU Operator section.
TIP: Sometimes, applications require a longer period of time to deploy, which causes the installation to time out. Add the --wait-timeout <time to wait>
flag and specify a period of time (for example, 1h
) to allocate more time to the deployment of applications.
Select the Correct Toolkit Version for your NVIDIA GPU Operator
The NVIDIA Container Toolkit allows users to run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPU and must be configured correctly according to your base operating system.
Kommander (Management Cluster) Customization
Select the correct Toolkit version based on your OS:
The NVIDIA Container Toolkit allows users to run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPU and must be configured correctly according to your base operating system.
Centos 7.9/RHEL 7.9:
If you’re using Centos 7.9 or RHEL 7.9 as the base operating system for your GPU enabled nodes, set thetoolkit.version
parameter in your Kommander Installer Configuration file or<kommander.yaml>
to the following:CODEkind: Installation apps: nvidia-gpu-operator: enabled: true values: | toolkit: version: v1.10.0-centos7
RHEL 8.4/8.6 and SLES 15 SP3
If you’re using RHEL 8.4/8.6 or SLES 15 SP3 as the base operating system for your GPU enabled nodes, set thetoolkit.version
parameter in your Kommander Installer Configuration file or<kommander.yaml>
to the following:CODEkind: Installation apps: nvidia-gpu-operator: enabled: true values: | toolkit: version: v1.10.0-ubi8
Ubuntu 18.04 and 20.04
If you’re using Ubuntu 18.04 or 20.04 as the base operating system for your GPU enabled nodes, set thetoolkit.version
parameter in your Kommander Installer Configuration file or<kommander.yaml>
to the following:CODEkind: Installation apps: nvidia-gpu-operator: enabled: true values: | toolkit: version: v1.11.0-ubuntu20.04
Install Kommander, using the configuration file you created:
CODEdkp install kommander --installer-config ./install.yaml
In the previous command, the
--kubeconfig=${CLUSTER_NAME}.conf
flag ensures that you set the context to install Kommander on the right cluster. For alternatives and recommendations around setting your context, refer to Provide Context for Commands with a kubeconfig File.TIP: Sometimes, applications require a longer period of time to deploy, which causes the installation to time out. Add the
--wait-timeout <time to wait>
flag and specify a period of time (for example,1h
) to allocate more time to the deployment of applications.