Machine Learning Methodology for PDE Representation

Objectives

Understand how PDE data is represented as multiple modalities and learned with ML
Learn the two-level autoencoder architecture: modality-specific encoding followed by latent alignment
Understand why Relative Energy Error (REE) is used instead of standard MSE
Know how models are trained, aligned, fine-tuned, and evaluated end-to-end
Appreciate the cross-modal transfer capability enabled by the shared latent space

Key insight: representation learning, not prediction

This project does not train a model to predict PDE solutions from parameters (that would be a neural operator / surrogate model). Instead, it learns a shared geometric structure across modalities – a latent space where different views of the same PDE state are mapped to nearby points. This distinction is fundamental to understanding the methodology.

Methodology Overview

The machine learning methodology is designed to learn compact, resolution-invariant representations of PDE solutions. The core idea is to treat different numerical realizations of the same PDE state as multiple modalities and to learn a shared latent representation that captures their common structure.

The approach proceeds in three levels:

Level 1: Train independent autoencoders per modality (compression)
Level 2: Align the resulting latent spaces on a shared hypersphere (alignment)
Level 3: Fine-tune end-to-end encoder-decoder chains (refinement)

Data Representation

Modalities

Each PDE sample is represented through several modalities, all index-aligned:

Modality	Shape	Description
`u_16`	(N, 16, 16)	Solution field on 16x16 grid
`u_32`	(N, 32, 32)	Solution field on 32x32 grid
`u_64`	(N, 64, 64)	Solution field on 64x64 grid
`u_128`	(N, 128, 128)	Solution field on 128x128 grid
`u_256`	(N, 256, 256)	Solution field on 256x256 grid
`streamfunc`	(N, K)	Streamfunction coefficient vector

Index alignment is critical: sample i in every file corresponds to the same underlying PDE state (same velocity field, same boundary conditions, same physical solution observed at different resolutions).

Dataset Splitting

A single, global split is created by create_splits.py and reused for all experiments:

Training set: Used for gradient-based optimization
Validation set: Used for early stopping and hyperparameter selection
Test set: Used for final evaluation only

This guarantees consistent evaluation across modalities, prevents information leakage, and ensures reproducibility. The split indices are saved as .npy files and loaded by every downstream script.

Level 1: Modality-Specific Autoencoders

Architecture

Two distinct architectures are used, matched to the structure of each modality:

Convolutional autoencoders (for grid-based solution fields):

Encoder: series of convolutional layers with spatial downsampling (strided convolutions)
Bottleneck: flattened to a dense latent vector of dimension d (typically 32)
Decoder: transposed convolutions with spatial upsampling back to original grid size
One autoencoder is trained per grid resolution (16x16, 32x32, …, 256x256)

MLP autoencoders (for streamfunction coefficients):

Encoder: fully connected layers with decreasing width
Bottleneck: dense latent vector of dimension d (same as solution autoencoders)
Decoder: fully connected layers with increasing width
Latent whitening regularization encourages isotropic, well-conditioned latent distributions

Loss Function: Relative Energy Error (REE)

Standard MSE penalizes large-magnitude solutions disproportionately. Instead, the project uses Relative Energy Error:

REE(x, x_hat) = || x - x_hat ||^2 / || x ||^2

This is scale-invariant: a 1% reconstruction error is treated equally whether the solution has magnitude 0.01 or 100. REE is used for both training loss and evaluation metrics throughout the project.

Why not standard MSE?

PDE solutions can vary enormously in magnitude depending on the diffusion coefficient and velocity field parameters. A fixed MSE threshold would be meaningless: 0.001 MSE might be excellent for a large-amplitude solution but catastrophic for a small one. REE normalizes by the solution energy, making error comparisons meaningful across the parameter space.

Training Details

Each modality autoencoder is trained independently
Latent dimension is fixed across all modalities (typically d = 32)
Validation REE is monitored for early stopping
Trained encoders and decoders are saved separately for later use in the alignment pipeline

Level 2: Latent-Space Alignment

Motivation

After Level 1, each modality has its own encoder producing 32-dimensional latent vectors. However, these latent spaces are not comparable: the encoder for the 32x32 grid and the encoder for the 128x128 grid have no reason to produce similar latent vectors for the same PDE sample. The second level corrects this.

Alignment Strategy

The alignment procedure works as follows:

Project onto the unit hypersphere: Each modality’s latent vectors are L2-normalized, placing them on the surface of the unit sphere in R^d
Define a joint latent: For each sample, compute the normalized mean of all modality latents as the “consensus” representation
Gradual target shifting: During training, the reconstruction target smoothly transitions from each modality’s own latent (self-reconstruction) to the joint latent (cross-modal agreement)

This enforces three properties simultaneously:

Geometric consistency: All modalities share the same latent geometry (the hypersphere)
Cross-modal agreement: Encoding any modality of sample i produces a similar latent vector
Information preservation: Each modality can still be reconstructed from its aligned latent

Why the Unit Hypersphere?

Normalizing latents onto the sphere prevents magnitude collapse (where the model shrinks latents to near-zero to minimize alignment loss) and provides a natural distance metric (cosine similarity). It also makes the alignment loss independent of the overall scale of latent activations.

Level 3: End-to-End Fine-Tuning

After alignment, the full encode-decode chains can be fine-tuned:

Encoder fine-tuning (finetune_encoder_to_latent.py): Trains the path from raw modality input directly to the joint latent, bypassing the intermediate Level 1 latent
Decoder fine-tuning (finetune_decoder_from_latent.py): Trains the path from the joint latent directly to the modality output

Fine-tuning corrects accumulated approximation errors from the two-level procedure and improves:

Numerical stability of the encode-decode chain
Cross-modal reconstruction accuracy (encode one modality, decode another)
Direct usability of the joint latent space for downstream tasks

Evaluation Methodology

Quantitative Metrics

Relative Energy Error (REE) is computed for:

Encoding accuracy: REE between the encoded sample and the joint latent target
Decoding accuracy: REE between the decoded reconstruction and the original modality data
Cross-modal accuracy: REE when encoding one modality and decoding another

All metrics are reported separately for train, validation, and test splits.

Qualitative Evaluation

Cross-modal reconstructions: Encode one modality, decode into all others, visualize side-by-side
Visual inspection: Check for systematic artifacts, resolution-dependent smoothing, or modality-specific biases

Alignment Diagnostics

Additional analyses from analyze_latent_alignment.py:

Pairwise latent distance matrices between modalities
REE after centering (mean subtraction) to isolate rotational differences
REE after optimal Procrustes alignment to measure best-case agreement
Deviation of each modality from the canonical mean latent

Experimental Philosophy

This methodology is intentionally modular, explicit, and diagnostic:

Each stage can be run, inspected, and debugged independently
All intermediate artifacts (latent vectors, metrics, plots) are saved to disk
The emphasis is on understanding what the model learns, not on optimizing a leaderboard metric

The pipeline is designed so that researchers can modify one stage (e.g., swap the autoencoder architecture) without disrupting the rest of the pipeline.

Keypoints

The ML approach is representation learning (shared latent space), not prediction (surrogate model)
Each grid resolution is treated as a separate modality with its own autoencoder
Relative Energy Error (REE) provides scale-invariant loss and evaluation
A second-level alignment maps all modality latents onto a shared unit hypersphere
End-to-end fine-tuning corrects accumulated errors from the two-level approach
Cross-modal transfer (encode one modality, decode another) is the central demonstrated capability
The pipeline is modular: each stage is independent and produces inspectable artifacts