Setting Up the Environment
Objectives
Clone the repository and run the automated setup script
Understand what
setup.shdoes at each stepPerform manual installation for CPU-only or custom CUDA setups
Verify that PyTorch, DGL, and the
ais_dglpackage work togetherRegister a Jupyter kernel for notebook usage
Quick Start
The fastest way to get started is the automated setup script:
git clone https://github.com/NAICNO/wp7-UC5-ais-classification-gnn.git
cd graph-based-classification-of-ais-time-series-data
chmod +x setup.sh
./setup.sh
source venv/bin/activate
What setup.sh Does
The setup script performs these steps in order:
Detect GPU availability: Checks for
nvidia-smiand queries the CUDA version to determine whether to install GPU or CPU packagesCreate a Python virtual environment: Creates
venv/usingpython3 -m venvUpgrade pip: Ensures the latest pip version is available
Install PyTorch: Installs PyTorch with the matching CUDA version (or CPU-only if no GPU is detected)
Install DGL: Installs the Deep Graph Library with the matching CUDA backend
Install the
ais_dglpackage: Installs the project package in development mode (pip install -e .)Register Jupyter kernel: Creates an
ais_dglkernel for use in Jupyter notebooks
CUDA Version Matching
PyTorch and DGL must be installed with matching CUDA versions. The setup.sh script handles this automatically. If you see errors about CUDA version mismatches, delete the venv/ directory and run setup.sh again. The script reads the CUDA version from nvidia-smi output and selects the appropriate package index.
Manual Installation
If you need more control over the installation (e.g., a specific PyTorch version or CPU-only setup), follow these steps.
CPU-Only Installation
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
# Install PyTorch (CPU only)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# Install DGL (CPU only)
pip install dgl -f https://data.dgl.ai/wheels/repo.html
# Install project package
pip install -e .
GPU Installation (CUDA 12.x)
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
# Install PyTorch with CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install DGL with CUDA 12.1
pip install dgl -f https://data.dgl.ai/wheels/cu121/repo.html
# Install project package
pip install -e .
GPU Installation (CUDA 11.x)
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
# Install PyTorch with CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# Install DGL with CUDA 11.8
pip install dgl -f https://data.dgl.ai/wheels/cu118/repo.html
# Install project package
pip install -e .
Verify Installation
Run these checks to confirm everything is installed correctly:
# Check PyTorch and DGL versions
python -c "import torch; import dgl; print(f'PyTorch {torch.__version__}, DGL {dgl.__version__}')"
# Check CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Check that DGL can create a graph
python -c "import dgl; g = dgl.graph(([0,1],[1,2])); print(f'DGL graph: {g.num_nodes()} nodes, {g.num_edges()} edges')"
# Run the test suite
pytest tests/ -v
Verifying DGL and PyTorch Work Together
A quick smoke test to confirm the full pipeline works:
import torch
import dgl
# Create a simple graph
g = dgl.graph(([0, 1, 2], [1, 2, 0]))
g = dgl.add_self_loop(g)
# Add node features
g.ndata['feat'] = torch.randn(3, 4)
# Test a GCN layer
from dgl.nn import GraphConv
conv = GraphConv(4, 2)
out = conv(g, g.ndata['feat'])
print(f'Input shape: {g.ndata["feat"].shape}')
print(f'Output shape: {out.shape}')
print('DGL + PyTorch integration OK')
If you have a GPU, also verify GPU tensor operations:
import torch
if torch.cuda.is_available():
x = torch.randn(3, 4, device='cuda')
print(f'GPU tensor device: {x.device}')
print(f'GPU name: {torch.cuda.get_device_name(0)}')
else:
print('No GPU available -- CPU mode will be used')
Jupyter Kernel Setup
To use the environment in Jupyter notebooks, register the kernel:
source venv/bin/activate
python -m ipykernel install --user --name=ais_dgl --display-name "AIS DGL (Python 3)"
Verify the kernel is registered:
jupyter kernelspec list
You should see ais_dgl in the list. When opening the notebook, select “AIS DGL (Python 3)” as the kernel.
If Jupyter Lab is not installed, add it to the environment:
pip install jupyterlab
jupyter lab --no-browser --ip=127.0.0.1 --port=8888
Data Requirements
Place the AIS dataset files in a data/ directory:
File |
Shape |
Description |
|---|---|---|
|
|
Feature array: velocity, distance to shore, curvature for 12 time steps |
|
|
Label array: 0 (non-fishing) or 1 (fishing) |
|
|
Bootstrap split indices for 50 different train/val/test splits |
The total dataset contains ~23,500 samples split into training (14,100), validation (4,700), and test (4,700) sets.
Keypoints
Run
setup.shfor automated installation with GPU auto-detectionPyTorch and DGL must have matching CUDA versions – the setup script handles this
For CPU-only installations, use the
--index-urlflag with the CPU wheel URLVerify the installation with
import torch; import dgland a simple graph creation testRegister the Jupyter kernel with
python -m ipykernel install --user --name=ais_dglThe dataset consists of three
.npyfiles placed in thedata/directoryDelete
venv/and re-runsetup.shif you encounter version mismatch errors