Provisioning a VM

Objectives

  • Provision a GPU-enabled virtual machine on the NAIC Orchestrator

  • Connect to the VM via SSH with proper key configuration

  • Verify GPU availability and driver installation

  • Understand the NAIC Orchestrator VM environment

NAIC Orchestrator

The NAIC Orchestrator at orchestrator.naic.no provides virtual machines pre-configured for AI workloads. These VMs come with NVIDIA GPU drivers, CUDA toolkit, and standard ML libraries pre-installed.

Step 1: Request a VM

  1. Log in to the NAIC Orchestrator portal with your Feide credentials

  2. Select “Create VM” from the dashboard

  3. Choose a GPU-enabled flavor:

    • 1x NVIDIA T4 (16 GB VRAM) – recommended for this demonstrator

    • 1x NVIDIA A100 (40/80 GB VRAM) – for large-scale experiments or multiple UCs

  4. Select Ubuntu 22.04 as the operating system

  5. Upload your SSH public key (or use one already registered)

  6. Note the assigned IP address once the VM is ready

VM Startup Time

VM provisioning typically takes 2-5 minutes. The portal status will change from “Building” to “Active” when the VM is ready to accept SSH connections. If the VM stays in “Building” for more than 10 minutes, try deleting and recreating it.

Step 2: Connect via SSH

ssh -i ~/.ssh/naic-vm.pem ubuntu@<YOUR_VM_IP>

Replace <YOUR_VM_IP> with your actual VM IP address.

SSH Troubleshooting

If you cannot connect, check these common issues:

Problem

Solution

Permission denied (publickey)

Ensure your key file has correct permissions: chmod 600 ~/.ssh/naic-vm.pem

Connection timed out

Verify the VM is in “Active” state in the portal; check that your network allows outbound SSH (port 22)

Host key verification failed

Remove the old entry: ssh-keygen -R <YOUR_VM_IP> and reconnect

Connection refused

The SSH daemon may not have started yet; wait 1-2 minutes after VM creation

SSH Config (Optional)

For convenience, add the VM to your SSH config file (~/.ssh/config):

Host naic-vm
    HostName <YOUR_VM_IP>
    User ubuntu
    IdentityFile ~/.ssh/naic-vm.pem
    StrictHostKeyChecking no

Then connect with just ssh naic-vm.

Step 3: Initialize the VM

curl -O https://raw.githubusercontent.com/NAICNO/wp7-UC5-ais-classification-gnn/main/vm-init.sh
chmod +x vm-init.sh
./vm-init.sh

This installs system packages, checks for GPU drivers, and configures CUDA.

Step 4: Verify GPU Availability

After initialization, confirm that the GPU is recognized:

# Check NVIDIA driver and GPU
nvidia-smi

You should see output showing your GPU model, driver version, and CUDA version. For example:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03   Driver Version: 535.129.03   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
+-------------------------------+----------------------+----------------------+

If nvidia-smi is not found or shows an error, the GPU drivers may need to be installed:

# Install NVIDIA drivers (if not pre-installed)
sudo apt-get update
sudo apt-get install -y nvidia-driver-535
sudo reboot

After rebooting, reconnect via SSH and verify with nvidia-smi again.

CUDA Toolkit Version

The CUDA version shown by nvidia-smi is the maximum supported CUDA version for the driver. PyTorch and DGL ship their own CUDA runtime libraries, so the PyTorch CUDA version does not need to exactly match the driver version – it just needs to be equal to or lower than the driver version.