Setting Up the Environment

Objectives

Clone the repository and run the automated setup script
Understand what setup.sh does at each step
Perform manual installation for CPU-only or custom CUDA setups
Verify that PyTorch, DGL, and the ais_dgl package work together
Register a Jupyter kernel for notebook usage

Quick Start

The fastest way to get started is the automated setup script:

git clone https://github.com/NAICNO/wp7-UC5-ais-classification-gnn.git
cd graph-based-classification-of-ais-time-series-data
chmod +x setup.sh
./setup.sh
source venv/bin/activate

What `setup.sh` Does

The setup script performs these steps in order:

Detect GPU availability: Checks for nvidia-smi and queries the CUDA version to determine whether to install GPU or CPU packages
Create a Python virtual environment: Creates venv/ using python3 -m venv
Upgrade pip: Ensures the latest pip version is available
Install PyTorch: Installs PyTorch with the matching CUDA version (or CPU-only if no GPU is detected)
Install DGL: Installs the Deep Graph Library with the matching CUDA backend
Install the ais_dgl package: Installs the project package in development mode (pip install -e .)
Register Jupyter kernel: Creates an ais_dgl kernel for use in Jupyter notebooks

CUDA Version Matching

PyTorch and DGL must be installed with matching CUDA versions. The setup.sh script handles this automatically. If you see errors about CUDA version mismatches, delete the venv/ directory and run setup.sh again. The script reads the CUDA version from nvidia-smi output and selects the appropriate package index.

Manual Installation

If you need more control over the installation (e.g., a specific PyTorch version or CPU-only setup), follow these steps.

CPU-Only Installation

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install PyTorch (CPU only)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

# Install DGL (CPU only)
pip install dgl -f https://data.dgl.ai/wheels/repo.html

# Install project package
pip install -e .

GPU Installation (CUDA 12.x)

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install PyTorch with CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install DGL with CUDA 12.1
pip install dgl -f https://data.dgl.ai/wheels/cu121/repo.html

# Install project package
pip install -e .

GPU Installation (CUDA 11.x)

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install PyTorch with CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install DGL with CUDA 11.8
pip install dgl -f https://data.dgl.ai/wheels/cu118/repo.html

# Install project package
pip install -e .

Verify Installation

Run these checks to confirm everything is installed correctly:

# Check PyTorch and DGL versions
python -c "import torch; import dgl; print(f'PyTorch {torch.__version__}, DGL {dgl.__version__}')"

# Check CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# Check that DGL can create a graph
python -c "import dgl; g = dgl.graph(([0,1],[1,2])); print(f'DGL graph: {g.num_nodes()} nodes, {g.num_edges()} edges')"

# Run the test suite
pytest tests/ -v

Verifying DGL and PyTorch Work Together

A quick smoke test to confirm the full pipeline works:

import torch
import dgl

# Create a simple graph
g = dgl.graph(([0, 1, 2], [1, 2, 0]))
g = dgl.add_self_loop(g)

# Add node features
g.ndata['feat'] = torch.randn(3, 4)

# Test a GCN layer
from dgl.nn import GraphConv
conv = GraphConv(4, 2)
out = conv(g, g.ndata['feat'])
print(f'Input shape: {g.ndata["feat"].shape}')
print(f'Output shape: {out.shape}')
print('DGL + PyTorch integration OK')

If you have a GPU, also verify GPU tensor operations:

import torch
if torch.cuda.is_available():
    x = torch.randn(3, 4, device='cuda')
    print(f'GPU tensor device: {x.device}')
    print(f'GPU name: {torch.cuda.get_device_name(0)}')
else:
    print('No GPU available -- CPU mode will be used')

Jupyter Kernel Setup

To use the environment in Jupyter notebooks, register the kernel:

source venv/bin/activate
python -m ipykernel install --user --name=ais_dgl --display-name "AIS DGL (Python 3)"

Verify the kernel is registered:

jupyter kernelspec list

You should see ais_dgl in the list. When opening the notebook, select “AIS DGL (Python 3)” as the kernel.

If Jupyter Lab is not installed, add it to the environment:

pip install jupyterlab
jupyter lab --no-browser --ip=127.0.0.1 --port=8888

Data Requirements

Place the AIS dataset files in a data/ directory:

File	Shape	Description
`X_ts12.npy`	`(N, 3, 12)`	Feature array: velocity, distance to shore, curvature for 12 time steps
`y_ts12.npy`	`(N,)`	Label array: 0 (non-fishing) or 1 (fishing)
`bidx_ts12.npy`	`(50, N)`	Bootstrap split indices for 50 different train/val/test splits

The total dataset contains ~23,500 samples split into training (14,100), validation (4,700), and test (4,700) sets.

Keypoints

Run setup.sh for automated installation with GPU auto-detection
PyTorch and DGL must have matching CUDA versions – the setup script handles this
For CPU-only installations, use the --index-url flag with the CPU wheel URL
Verify the installation with import torch; import dgl and a simple graph creation test
Register the Jupyter kernel with python -m ipykernel install --user --name=ais_dgl
The dataset consists of three .npy files placed in the data/ directory
Delete venv/ and re-run setup.sh if you encounter version mismatch errors