NVIDIA Platform Application Attached or Managed Cluster
Instructions on enabling the NVIDIA platform application on attached or managed clusters
Enable NVIDIA Platform Application on Attached or Managed Clusters
If you intend to run applications that utilize GPU’s on Attached or Managed clusters, you must enable the nvidia-gpu-operator
platform application in the workspace.
To use the UI to enable the application, refer to the Platform Applications | Customize-a-workspace’s-applications page.
To use the CLI, refer to the Deploy Platform Applications via CLI page.
If only a subset of attached or managed clusters in the workspace are utilizing GPU’s, refer to Enable an Application per Cluster on how to only enable the nvidia-gpu-operator
on specific clusters.
After you have enabled the nvidia-gpu-operator
app in the workspace on the necessary clusters, proceed to the next section.
Select the Correct Toolkit Version for your NVIDIA GPU Operator
The NVIDIA Container Toolkit allows users to run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPU and must be configured correctly according to your base operating system.
Workspace (Attached and Managed clusters) Customization
Refer to AppDeployment resources for how to use the CLI to customize the platform application on a workspace.
If specific attached/managed clusters in the workspace require different configurations, refer to Customize an Application per Cluster for how to do this.
Select the correct Toolkit version based on your OS and create a
ConfigMap
with these configuration override values:Centos 7.9/RHEL 7.9:
If you’re using Centos 7.9 or RHEL 7.9 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:CODEcat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: namespace: ${WORKSPACE_NAMESPACE} name: nvidia-gpu-operator-overrides-attached data: values.yaml: | toolkit: version: v1.10.0-centos7 EOF
RHEL 8.4/8.6 and SLES 15 SP3
If you’re using RHEL 8.4/8.6 or SLES 15 SP3 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:CODEcat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: namespace: ${WORKSPACE_NAMESPACE} name: nvidia-gpu-operator-overrides-attached data: values.yaml: | toolkit: version: v1.10.0-ubi8 EOF
Ubuntu 18.04 and 20.04
If you’re using Ubuntu 18.04 or 20.04 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:CODEcat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: namespace: ${WORKSPACE_NAMESPACE} name: nvidia-gpu-operator-overrides-attached data: values.yaml: | toolkit: version: v1.11.0-ubuntu20.04 EOF
Note the name of this
ConfigMap
(nvidia-gpu-operator-overrides-attached
) and use it to set the necessarynvidia-gpu-operator
AppDeployment
spec fields depending on the scope of the override. Alternatively, you can also use the UI to pass in the configuration overrides for the app per workspace or per cluster.