# Setting Up the Environment ```{objectives} - Clone the repository and run the automated setup script - Understand what `setup.sh` does at each step - Perform manual installation for CPU-only or custom CUDA setups - Verify that PyTorch, DGL, and the `ais_dgl` package work together - Register a Jupyter kernel for notebook usage ``` ## Quick Start The fastest way to get started is the automated setup script: ```bash git clone https://github.com/NAICNO/wp7-UC5-ais-classification-gnn.git cd graph-based-classification-of-ais-time-series-data chmod +x setup.sh ./setup.sh source venv/bin/activate ``` ## What `setup.sh` Does The setup script performs these steps in order: 1. **Detect GPU availability**: Checks for `nvidia-smi` and queries the CUDA version to determine whether to install GPU or CPU packages 2. **Create a Python virtual environment**: Creates `venv/` using `python3 -m venv` 3. **Upgrade pip**: Ensures the latest pip version is available 4. **Install PyTorch**: Installs PyTorch with the matching CUDA version (or CPU-only if no GPU is detected) 5. **Install DGL**: Installs the Deep Graph Library with the matching CUDA backend 6. **Install the `ais_dgl` package**: Installs the project package in development mode (`pip install -e .`) 7. **Register Jupyter kernel**: Creates an `ais_dgl` kernel for use in Jupyter notebooks ```{admonition} CUDA Version Matching :class: tip PyTorch and DGL must be installed with matching CUDA versions. The `setup.sh` script handles this automatically. If you see errors about CUDA version mismatches, delete the `venv/` directory and run `setup.sh` again. The script reads the CUDA version from `nvidia-smi` output and selects the appropriate package index. ``` ## Manual Installation If you need more control over the installation (e.g., a specific PyTorch version or CPU-only setup), follow these steps. ### CPU-Only Installation ```bash # Create and activate virtual environment python3 -m venv venv source venv/bin/activate pip install --upgrade pip # Install PyTorch (CPU only) pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu # Install DGL (CPU only) pip install dgl -f https://data.dgl.ai/wheels/repo.html # Install project package pip install -e . ``` ### GPU Installation (CUDA 12.x) ```bash # Create and activate virtual environment python3 -m venv venv source venv/bin/activate pip install --upgrade pip # Install PyTorch with CUDA 12.1 pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 # Install DGL with CUDA 12.1 pip install dgl -f https://data.dgl.ai/wheels/cu121/repo.html # Install project package pip install -e . ``` ### GPU Installation (CUDA 11.x) ```bash # Create and activate virtual environment python3 -m venv venv source venv/bin/activate pip install --upgrade pip # Install PyTorch with CUDA 11.8 pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 # Install DGL with CUDA 11.8 pip install dgl -f https://data.dgl.ai/wheels/cu118/repo.html # Install project package pip install -e . ``` ## Verify Installation Run these checks to confirm everything is installed correctly: ```bash # Check PyTorch and DGL versions python -c "import torch; import dgl; print(f'PyTorch {torch.__version__}, DGL {dgl.__version__}')" # Check CUDA availability python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')" # Check that DGL can create a graph python -c "import dgl; g = dgl.graph(([0,1],[1,2])); print(f'DGL graph: {g.num_nodes()} nodes, {g.num_edges()} edges')" # Run the test suite pytest tests/ -v ``` ### Verifying DGL and PyTorch Work Together A quick smoke test to confirm the full pipeline works: ```python import torch import dgl # Create a simple graph g = dgl.graph(([0, 1, 2], [1, 2, 0])) g = dgl.add_self_loop(g) # Add node features g.ndata['feat'] = torch.randn(3, 4) # Test a GCN layer from dgl.nn import GraphConv conv = GraphConv(4, 2) out = conv(g, g.ndata['feat']) print(f'Input shape: {g.ndata["feat"].shape}') print(f'Output shape: {out.shape}') print('DGL + PyTorch integration OK') ``` If you have a GPU, also verify GPU tensor operations: ```python import torch if torch.cuda.is_available(): x = torch.randn(3, 4, device='cuda') print(f'GPU tensor device: {x.device}') print(f'GPU name: {torch.cuda.get_device_name(0)}') else: print('No GPU available -- CPU mode will be used') ``` ## Jupyter Kernel Setup To use the environment in Jupyter notebooks, register the kernel: ```bash source venv/bin/activate python -m ipykernel install --user --name=ais_dgl --display-name "AIS DGL (Python 3)" ``` Verify the kernel is registered: ```bash jupyter kernelspec list ``` You should see `ais_dgl` in the list. When opening the notebook, select **"AIS DGL (Python 3)"** as the kernel. If Jupyter Lab is not installed, add it to the environment: ```bash pip install jupyterlab jupyter lab --no-browser --ip=127.0.0.1 --port=8888 ``` ## Data Requirements Place the AIS dataset files in a `data/` directory: | File | Shape | Description | |------|-------|-------------| | `X_ts12.npy` | `(N, 3, 12)` | Feature array: velocity, distance to shore, curvature for 12 time steps | | `y_ts12.npy` | `(N,)` | Label array: 0 (non-fishing) or 1 (fishing) | | `bidx_ts12.npy` | `(50, N)` | Bootstrap split indices for 50 different train/val/test splits | The total dataset contains ~23,500 samples split into training (14,100), validation (4,700), and test (4,700) sets. ```{keypoints} - Run `setup.sh` for automated installation with GPU auto-detection - PyTorch and DGL must have matching CUDA versions -- the setup script handles this - For CPU-only installations, use the `--index-url` flag with the CPU wheel URL - Verify the installation with `import torch; import dgl` and a simple graph creation test - Register the Jupyter kernel with `python -m ipykernel install --user --name=ais_dgl` - The dataset consists of three `.npy` files placed in the `data/` directory - Delete `venv/` and re-run `setup.sh` if you encounter version mismatch errors ```