UC1 — Climate Indices Teleconnection Analysis

Objectives

  • Understand what climate teleconnections are and why ML is useful for discovering them

  • Learn about the NorESM1-F simulation data and the 65 climate indices

  • Understand the ML pipeline: lagged features, ensemble models, wavelet filtering

  • Know how to run both interactive (Orchestrator) and CLI workflows

Repository: wp7-UC1-climate-indices-teleconnection Tutorial: https://naicno.github.io/wp7-UC1-climate-indices-teleconnection/ Contributors: Klaus Johannsen, Odd Helge Otterå, Adrian Evensen, Hasan Asyari Arief (NORCE Research)

The Problem

Climate teleconnections are large-scale patterns that link distant regions’ weather and climate. They are central to decadal climate prediction, but identifying them from observational data is hard. Traditional approaches rely on expert-curated index pairs and linear statistics.

UC1 tests whether machine learning can systematically discover teleconnection relationships across a large set of climate indices — including non-linear ones — and use them for multi-decadal forecasts.

Data

The team worked with three long-term climate simulations from the Norwegian Earth System Model (NorESM1-F), spanning 850–2005 AD under different forcing scenarios:

Simulation

Forcing

Period

Low Solar

Reduced solar irradiance

850–2005 AD

High Solar

Enhanced solar irradiance

850–2005 AD

Pre-industrial Control

Constant forcing

1000 years

These simulations provide 65 climate indices covering:

  • Surface temperatures

  • Sea surface temperatures

  • Sea ice concentration

  • Precipitation

  • Atmospheric pressure

  • Ocean circulation

ML Pipeline

        graph LR
    A[65 Climate Indices] --> B[Normalize 0-100]
    B --> C[Generate Lagged Features<br>up to 150-year lags]
    C --> D[Train Ensemble Models]
    D --> E[Feature Importance]
    E --> F[Top-N Selection]
    F --> G[Evaluate: Pearson r + MAE]
    

The framework:

  1. Normalizes all indices to a 0–100 scale

  2. Generates lagged features to capture temporal dependencies (up to 150-year lags)

  3. Trains an ensemble of five model types:

Model

Type

Linear Regression

Baseline

Random Forest

Ensemble

XGBoost

Gradient Boosting

MLP

Neural Network

LRforcedPSO

PSO-constrained Linear Regression

  1. Feature importance is averaged across ensemble runs, top-N features are selected

  2. Performance is evaluated using Pearson correlation and MAE

  3. An optional Morlet wavelet bandpass filter isolates specific frequency bands

Results

Over 42,613 individual experiments were conducted across all model–target–lag combinations:

  • ML models achieved correlation coefficients exceeding 0.7 for more than 20 target climate indices

  • Statistically significant teleconnections identified across multi-decadal timescales

  • Results support 10–50 year forecasts of patterns such as Atlantic Multidecadal Variability (AMV) and Pacific Decadal Variability (PDV)

Infrastructure

UC1 runs on NAIC Orchestrator VMs via demonstrator-v1.orchestrator.ipynb for interactive exploration. It also provides a CLI for automated parameter sweeps, making it a reference for how large-scale ML experiments can use NAIC infrastructure for both interactive and CLI-driven workflows.

Quick Start

git clone https://github.com/NAICNO/wp7-UC1-climate-indices-teleconnection.git
cd wp7-UC1-climate-indices-teleconnection
bash setup.sh
jupyter notebook demonstrator-v1.orchestrator.ipynb

Keypoints

  • Teleconnections are large-scale patterns of climate variability

  • 65 climate indices from NorESM1-F simulations spanning 850–2005 AD

  • ML ensemble of 5 model types identifies teleconnection relationships

  • 42,613 experiments achieving >0.7 correlation for 20+ target indices

  • Supports 10–50 year forecasts of AMV and PDV patterns

  • Provides both interactive notebooks and CLI parameter sweeps on NAIC Orchestrator