Setting Up the Environment

Objectives

  • Clone the repository and run the automated setup script

  • Understand what setup.sh does at each step

  • Perform manual installation for CPU-only or custom CUDA setups

  • Verify that PyTorch, DGL, and the ais_dgl package work together

  • Register a Jupyter kernel for notebook usage

Quick Start

The fastest way to get started is the automated setup script:

git clone https://github.com/NAICNO/wp7-UC5-ais-classification-gnn.git
cd graph-based-classification-of-ais-time-series-data
chmod +x setup.sh
./setup.sh
source venv/bin/activate

What setup.sh Does

The setup script performs these steps in order:

  1. Detect GPU availability: Checks for nvidia-smi and queries the CUDA version to determine whether to install GPU or CPU packages

  2. Create a Python virtual environment: Creates venv/ using python3 -m venv

  3. Upgrade pip: Ensures the latest pip version is available

  4. Install PyTorch: Installs PyTorch with the matching CUDA version (or CPU-only if no GPU is detected)

  5. Install DGL: Installs the Deep Graph Library with the matching CUDA backend

  6. Install the ais_dgl package: Installs the project package in development mode (pip install -e .)

  7. Register Jupyter kernel: Creates an ais_dgl kernel for use in Jupyter notebooks

CUDA Version Matching

PyTorch and DGL must be installed with matching CUDA versions. The setup.sh script handles this automatically. If you see errors about CUDA version mismatches, delete the venv/ directory and run setup.sh again. The script reads the CUDA version from nvidia-smi output and selects the appropriate package index.

Manual Installation

If you need more control over the installation (e.g., a specific PyTorch version or CPU-only setup), follow these steps.

CPU-Only Installation

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install PyTorch (CPU only)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

# Install DGL (CPU only)
pip install dgl -f https://data.dgl.ai/wheels/repo.html

# Install project package
pip install -e .

GPU Installation (CUDA 12.x)

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install PyTorch with CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install DGL with CUDA 12.1
pip install dgl -f https://data.dgl.ai/wheels/cu121/repo.html

# Install project package
pip install -e .

GPU Installation (CUDA 11.x)

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install PyTorch with CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install DGL with CUDA 11.8
pip install dgl -f https://data.dgl.ai/wheels/cu118/repo.html

# Install project package
pip install -e .

Verify Installation

Run these checks to confirm everything is installed correctly:

# Check PyTorch and DGL versions
python -c "import torch; import dgl; print(f'PyTorch {torch.__version__}, DGL {dgl.__version__}')"

# Check CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# Check that DGL can create a graph
python -c "import dgl; g = dgl.graph(([0,1],[1,2])); print(f'DGL graph: {g.num_nodes()} nodes, {g.num_edges()} edges')"

# Run the test suite
pytest tests/ -v

Verifying DGL and PyTorch Work Together

A quick smoke test to confirm the full pipeline works:

import torch
import dgl

# Create a simple graph
g = dgl.graph(([0, 1, 2], [1, 2, 0]))
g = dgl.add_self_loop(g)

# Add node features
g.ndata['feat'] = torch.randn(3, 4)

# Test a GCN layer
from dgl.nn import GraphConv
conv = GraphConv(4, 2)
out = conv(g, g.ndata['feat'])
print(f'Input shape: {g.ndata["feat"].shape}')
print(f'Output shape: {out.shape}')
print('DGL + PyTorch integration OK')

If you have a GPU, also verify GPU tensor operations:

import torch
if torch.cuda.is_available():
    x = torch.randn(3, 4, device='cuda')
    print(f'GPU tensor device: {x.device}')
    print(f'GPU name: {torch.cuda.get_device_name(0)}')
else:
    print('No GPU available -- CPU mode will be used')

Jupyter Kernel Setup

To use the environment in Jupyter notebooks, register the kernel:

source venv/bin/activate
python -m ipykernel install --user --name=ais_dgl --display-name "AIS DGL (Python 3)"

Verify the kernel is registered:

jupyter kernelspec list

You should see ais_dgl in the list. When opening the notebook, select “AIS DGL (Python 3)” as the kernel.

If Jupyter Lab is not installed, add it to the environment:

pip install jupyterlab
jupyter lab --no-browser --ip=127.0.0.1 --port=8888

Data Requirements

Place the AIS dataset files in a data/ directory:

File

Shape

Description

X_ts12.npy

(N, 3, 12)

Feature array: velocity, distance to shore, curvature for 12 time steps

y_ts12.npy

(N,)

Label array: 0 (non-fishing) or 1 (fishing)

bidx_ts12.npy

(50, N)

Bootstrap split indices for 50 different train/val/test splits

The total dataset contains ~23,500 samples split into training (14,100), validation (4,700), and test (4,700) sets.

Keypoints

  • Run setup.sh for automated installation with GPU auto-detection

  • PyTorch and DGL must have matching CUDA versions – the setup script handles this

  • For CPU-only installations, use the --index-url flag with the CPU wheel URL

  • Verify the installation with import torch; import dgl and a simple graph creation test

  • Register the Jupyter kernel with python -m ipykernel install --user --name=ais_dgl

  • The dataset consists of three .npy files placed in the data/ directory

  • Delete venv/ and re-run setup.sh if you encounter version mismatch errors