Running Experiments
Objectives
Run the main analysis script
Use background training for long runs
Understand the command line options
Usage
1. Activate the Environment
cd ~/wp7-UC1-climate-indices-teleconnection
source venv/bin/activate
2. Run the Main Analysis Script
python scripts/lrbased_teleconnection/main.py \
--data_file dataset/noresm-f-p1000_slow_new_jfm.csv \
--target_feature amo2 \
--modelname LinearRegression \
--max_allowed_features 6 \
--end_lag 50
3. Interactive Exploration with Jupyter
jupyter lab
# Open demonstrator-v1.orchestrator.ipynb
Command Line Options
Option |
Description |
Example |
|---|---|---|
|
Path to dataset CSV |
|
|
Variable to predict |
|
|
ML model to use |
|
|
Max features for model |
|
|
Maximum lag in years |
|
|
Lag step size |
|
|
Train/test split ratio |
|
|
Number of ensemble runs |
|
|
Include mean feature |
Flag (no value) |
Background Training (Long Runs)
For long-running experiments, use tmux sessions:
# Start background training
tmux new-session -d -s training 'cd ~/wp7-UC1-climate-indices-teleconnection && source venv/bin/activate && \
python scripts/lrbased_teleconnection/main.py \
--data_file dataset/noresm-f-p1000_slow_new_jfm.csv \
--target_feature amo2 \
--modelname LRforcedPSO \
--max_allowed_features 6 \
--end_lag 100 \
--n_ensembles 100 2>&1 | tee training.log'
# Monitor progress
tail -f training.log
# Attach to session
tmux attach -t training
# Detach: Ctrl+B, then D
Example Experiments
Quick Test (Linear Regression)
python scripts/lrbased_teleconnection/main.py \
--data_file dataset/noresm-f-p1000_slow_new_jfm.csv \
--target_feature amo2 \
--modelname LinearRegression \
--max_allowed_features 3 \
--end_lag 30 \
--n_ensembles 5
Full Analysis (Multiple Models)
# Linear Regression
python scripts/lrbased_teleconnection/main.py \
--data_file dataset/noresm-f-p1000_slow_new_jfm.csv \
--target_feature amo2 \
--modelname LinearRegression \
--max_allowed_features 6 \
--end_lag 100
# Random Forest
python scripts/lrbased_teleconnection/main.py \
--data_file dataset/noresm-f-p1000_slow_new_jfm.csv \
--target_feature amo2 \
--modelname RandomForestRegressor \
--max_allowed_features 6 \
--end_lag 100
# XGBoost (GPU accelerated)
python scripts/lrbased_teleconnection/main.py \
--data_file dataset/noresm-f-p1000_slow_new_jfm.csv \
--target_feature amo2 \
--modelname XGBRegressor \
--max_allowed_features 6 \
--end_lag 100
Result File Format
Results are saved to the results/ directory:
Column |
Description |
|---|---|
|
Model name |
|
Predicted variable |
|
Maximum lag in years |
|
Correlation coefficient |
|
Mean Absolute Error |
|
Features used by model |
Keypoints
Run experiments using main.py with appropriate parameters
Use tmux for long-running background experiments
Results are saved to the results/ directory
Use Jupyter notebook for interactive exploration