Workshop Setup

Deploy the NVIDIA NIM Operator

20 minutes

The NVIDIA GPU Operator is a Kubernetes Operator that automates the deployment, configuration, and management of all necessary NVIDIA software components to provision GPUs within a Kubernetes cluster.

The NVIDIA NIM Operator is used to deploy LLMs in Kubernetes environments, such as the OpenShift cluster we created earlier in this workshop.

This section of the workshop walks through the steps necessary to deploy both the NVIDIA GPU and NIM operators in our OpenShift cluster.

Create a NVIDIA NGC Account

An NVIDIA GPU CLOUD (NGC) account is required to download LLMs and deploy them using the NVIDIA NIM operator. You can register here to create an account.

Register with the NVIDIA Developer Program

Registering with the NVIDIA Developer Program allows us to get access to NVIDIA NIM, which we’ll use later in the workshop to deploy LLMs.

Ensure that NVIDIA Developer Program appears on your list of NVIDIA subscriptions in NGC:

NVIDIA Subscriptions

Generate an NGC API Key

Once you’re logged in to the NGC website, click on your user account icon on the top-right corner of the screen and select Setup.

Then click Generate API Key and follow the instructions. Ensure the key is associated with the NGC Catalog and Secrets Manager services.

Save the generated key in a safe place as we’ll use it later in the workshop.

Refer to NVIDIA Documentation for further details on generating an NGC API key.

Install the Node Feature Discovery Operator

The steps in this section are based on Installing the NFD Operator using the CLI .

Run the following script to install the Node Feature Discovery Operator:

bash
cd nvidia
./install-nfd-operator.sh

To verify that the Operator deployment is successful, run:

bash
oc get pods

Create a NodeFeatureDiscovery CR

The steps in this section are based on Creating a NodeFeatureDiscovery CR by using the CLI .

Run the following script to create the Node Feature Discovery CR:

bash
./create-nfd-cr.sh

Install the NVIDIA GPU Operator

The steps in this section are based on Installing the NVIDIA GPU Operator on OpenShift .

Run the following script to install the NVIDIA GPU Operator:

bash
./install-nvidia-gpu-operator.sh

Wait until the install plan has been created:

bash
oc get installplan -n nvidia-gpu-operator

Approve the install plan with the following commands:

bash
INSTALL_PLAN=$(oc get installplan -n nvidia-gpu-operator -oname)
oc patch $INSTALL_PLAN -n nvidia-gpu-operator --type merge --patch '{"spec":{"approved":true }}'

Create the Cluster Policy

The steps in this section are based on Create the cluster policy using the CLI .

bash
./create-cluster-policy.sh

Verify the NVIDIA GPU Operator Installation

Verify the successful installation of the NVIDIA GPU Operator using the following command:

bash
oc get pods,daemonset -n nvidia-gpu-operator

Install the Operator SDK

The steps in this section are based on Install from GitHub release .

Download the release binary

Set platform information:

bash
export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; *) echo -n $(uname -m) ;; esac)
export OS=$(uname | awk '{print tolower($0)}')

Download the binary for your platform:

bash
export OPERATOR_SDK_DL_URL=https://github.com/operator-framework/operator-sdk/releases/download/v1.41.1
curl -LO ${OPERATOR_SDK_DL_URL}/operator-sdk_${OS}_${ARCH}

Verify the downloaded binary

Import the operator-sdk release GPG key from keyserver.ubuntu.com:

bash
gpg --keyserver keyserver.ubuntu.com --recv-keys 052996E2A20B5C7E

Download the checksums file and its signature, then verify the signature:

bash
curl -LO ${OPERATOR_SDK_DL_URL}/checksums.txt
curl -LO ${OPERATOR_SDK_DL_URL}/checksums.txt.asc
gpg -u "Operator SDK (release) <cncf-operator-sdk@cncf.io>" --verify checksums.txt.asc

You should see something similar to the following:

bash
gpg: assuming signed data in 'checksums.txt'
gpg: Signature made Fri 30 Oct 2020 12:15:15 PM PDT
gpg:                using RSA key ADE83605E945FA5A1BD8639C59E5B47624962185
gpg: Good signature from "Operator SDK (release) <cncf-operator-sdk@cncf.io>" [ultimate]

Make sure the checksums match:

bash
grep operator-sdk_${OS}_${ARCH} checksums.txt | sha256sum -c -

You should see something similar to the following:

bash
operator-sdk_linux_amd64: OK

Install the release binary in your PATH

bash
chmod +x operator-sdk_${OS}_${ARCH} && sudo mv operator-sdk_${OS}_${ARCH} /usr/local/bin/operator-sdk

Install the NGC CLI

The steps in this section are based on NGC CLI Install .

Click Download CLI to download the zip file that contains the binary, then transfer the zip file to a directory where you have permissions and then unzip and execute the binary. You can also download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:

bash
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/4.3.0/files/ngccli_linux.zip -O ngccli_linux.zip && unzip ngccli_linux.zip

Check the binary’s md5 hash to ensure the file wasn’t corrupted during download:

bash
find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5

Check the binary’s SHA256 hash to ensure the file wasn’t corrupted during download. Run the following command

bash
sha256sum ngccli_linux.zip

Compare with the following value, which can also be found in the Release Notes of the Resource:

bash
5f01eff85a66c895002f3c87db2933c462f3b86e461e60d515370f647b4ffc21

After verifying value, make the NGC CLI binary executable and add your current directory to path:

bash
chmod u+x ngc-cli/ngc
echo "export PATH=\"\$PATH:$(pwd)/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile

You must configure NGC CLI for your use so that you can run the commands.

Enter the following command, including your API key when prompted:

bash
ngc config set

Define an environment variable with your NGC API key:

bash
export NGC_API_KEY=<your NGC API key> 

Install the NVIDIA NIM Operator

The steps in this section are based on Installing NIM Operator on Red Hat OpenShift Using operator-sdk (for Development-Only) .

Run the following script to install the NIM operator:

bash
./install-nim-operator.sh

Confirm the controller pod is running:

bash
oc get pods -n nvidia-nim-operator
Last Modified ·