Analyzing Results

Objectives

  • Interpret Relative Energy Error (REE) values and understand what constitutes good vs. poor reconstruction

  • Analyze latent-space alignment quality using pairwise metrics and Procrustes diagnostics

  • Evaluate cross-modal reconstruction fidelity and identify failure modes

  • Use visualization scripts to build intuition about learned representations

  • Distinguish between alignment quality issues and reconstruction quality issues

What to expect from a successful experiment

A well-trained pipeline should exhibit:

  • REE below 0.01 (1%) for same-modality reconstruction (encode and decode the same resolution)

  • REE below 0.05 (5%) for cross-modal reconstruction (encode one resolution, decode another)

  • Pairwise latent REE that decreases substantially after Procrustes alignment

  • Test set performance comparable to validation set (no overfitting)

Scope of Analysis

The analysis addresses three complementary questions:

  1. Are modality-specific latent representations well aligned? (geometric consistency)

  2. Does the joint latent representation preserve sufficient information for reconstruction? (information preservation)

  3. How do errors vary across modalities, resolutions, and data splits? (generalization)

Visualization-only scripts (plot_solutions.py, called in notebook section “3. Plot solutions”) play a supportive role and are not part of the quantitative evaluation.


Latent-Space Alignment Analysis

Pairwise Modality Consistency

Script: analyze_latent_alignment.py Notebook section: “7. Analyze the alignments”

This analysis operates on second-level modality latents (lat_2_*_ld32.npy) and computes:

  • Pairwise REE between all modality pairs (raw latent vectors)

  • REE after centering (mean subtraction) to remove global offset differences

  • REE after Procrustes alignment to measure best-case geometric agreement

Outputs:

  • Pairwise REE matrices (CSV) for each alignment type

  • REE of each modality relative to the canonical mean latent

  • The canonical joint latent representation (lat_3_ld32.npy)

Interpreting Alignment Metrics

Observation

Interpretation

Action

Low raw REE (<0.01)

Strong intrinsic alignment

Latent spaces naturally agree

High raw REE, low Procrustes REE

Differences are primarily rotational

Alignment is working correctly

High Procrustes REE

Fundamental structural disagreement

Check autoencoder training quality

One modality with much higher REE-to-mean

Outlier modality

May need more training epochs or different architecture

Train REE << Test REE

Overfitting in alignment

Reduce model capacity or add regularization

What Procrustes alignment tells you

Procrustes finds the optimal rotation and reflection to align two point clouds. If the REE drops dramatically after Procrustes, it means the two latent spaces have learned similar structure but different orientation. The alignment network’s job is essentially to learn this rotation – so large Procrustes improvement is expected before alignment training and should diminish after.


Reconstruction and Encoding Errors

Encoding Accuracy

Script: compute_errors.py Notebook section: “11. Compute errors”

For each modality and data split (train/val/test), the encoding REE measures:

REE_encode = || encoder(x) - z_joint ||^2 / || z_joint ||^2

where z_joint is the canonical joint latent. This quantifies how consistently different encoders map physical inputs to the shared latent space.

Decoding Accuracy

Using the same script, decoding REE is computed:

REE_decode = || decoder(z_joint) - x ||^2 / || x ||^2

This measures how faithfully each decoder reconstructs its modality from the joint latent.

Reading the Error Tables

Results are reported per modality and per split. A typical output looks like:

Modality

Train REE

Val REE

Test REE

u_32

0.003

0.004

0.004

u_64

0.005

0.006

0.007

u_128

0.008

0.010

0.011

u_256

0.012

0.015

0.016

streamfunc

0.001

0.001

0.002

What to look for:

  • Monotonic increase with resolution: Higher-resolution fields are harder to reconstruct, so REE typically increases from u_16 to u_256. This is expected.

  • Train-test gap: A small gap (within 2x) is normal. A large gap indicates overfitting.

  • Streamfunction REE: Should be very low since coefficient vectors are already low-dimensional.

What Good Results Look Like

  • Same-modality reconstruction REE: 0.001 - 0.01 (excellent), 0.01 - 0.05 (acceptable)

  • Cross-modal reconstruction REE: 0.01 - 0.05 (good), 0.05 - 0.10 (marginal)

  • Encoding REE: < 0.01 (encoders agree on the joint latent)

What Bad Results Look Like

  • REE above 0.10 for any modality: the autoencoder is not compressing well enough

  • Test REE more than 5x train REE: severe overfitting

  • Cross-modal REE much larger than same-modal REE: alignment failed

  • One modality with dramatically higher error: that modality’s autoencoder needs retraining


Qualitative Cross-Modality Inspection

Script: plot_modalities.py Notebook section: “10. Plot modalities”

This script provides visual validation:

  1. Selects a random source modality and sample

  2. Encodes the sample into the joint latent space

  3. Decodes from the joint latent into all target modalities

  4. Displays original fields and reconstructions side-by-side

What to Look for in Plots

Good signs:

  • Reconstructed solution fields visually match originals at all resolutions

  • Boundary layers and sharp features are preserved (not smoothed away)

  • Cross-resolution reconstructions maintain consistent physical structure

Warning signs:

  • Systematic smoothing: Sharp features (boundary layers, internal layers) are blurred. This suggests the autoencoder latent dimension is too small.

  • Checkerboard artifacts: Grid-scale oscillations in reconstructions. This suggests the decoder architecture has issues (common with transposed convolutions).

  • Resolution-dependent bias: Low-resolution reconstructions look fine but high-resolution ones fail. This suggests the high-resolution autoencoder needs more capacity.

  • Modality confusion: Reconstructed streamfunction coefficients do not correspond to the velocity field seen in the solution. This indicates an alignment failure.


Supporting Visualization

Script: plot_solutions.py Notebook section: “3. Plot solutions”

Visualizes raw PDE solutions at multiple spatial resolutions. Used to:

  • Build intuition about how solutions vary across the parameter space

  • Provide reference fields for visual comparison with reconstructions

  • Observe how advection-dominated solutions differ from diffusion-dominated ones

No quantitative metrics are derived from this script.


Latent Space Visualization Tips

While the latent space is typically 32-dimensional, useful low-dimensional projections include:

  • PCA (first 2-3 components): Shows the dominant variation directions. If modality latents cluster by modality rather than by sample, alignment has failed.

  • t-SNE or UMAP: Shows local neighborhood structure. Aligned modalities should intermingle; unaligned ones form separate clusters.

  • Pairwise scatter plots: Plot latent dimension i vs. dimension j for different modalities. Aligned modalities should show correlated clouds.

Visualization is not evaluation

Low-dimensional projections can be misleading. Two latent spaces that look overlapping in a PCA projection may still have high REE in the full 32-dimensional space. Always use quantitative metrics (REE tables) as the primary evaluation and visualization as a supplementary check.


Result Artifacts

Generated analysis artifacts include:

Artifact

Location

Description

Pairwise REE matrices

results/*.csv

Raw, centered, and Procrustes-aligned REE between all modality pairs

Error summaries

results/*.csv

Per-modality, per-split encoding and decoding REE

Joint latent

latents/lat_3_ld32.npy

Canonical joint latent representation

Reconstruction plots

results/*.png

Cross-modal reconstruction visualizations

Solution visualizations

results/*.png

Raw PDE solutions at multiple resolutions


Summary

The analysis framework combines three complementary approaches:

  1. Geometric latent-space diagnostics – Are the latent representations structurally aligned across modalities?

  2. Quantitative reconstruction error metrics – Can the original data be faithfully recovered from the joint latent?

  3. Qualitative cross-modality visual inspection – Do reconstructions look physically reasonable?

Together, these analyses validate whether the learned joint latent space provides a consistent, resolution-agnostic representation of PDE solutions suitable for downstream tasks such as interpolation, comparison, and transfer learning.

Keypoints

  • REE below 0.01 (1%) indicates good same-modality reconstruction; below 0.05 (5%) for cross-modal

  • Procrustes REE improvement indicates rotational differences (expected; alignment corrects these)

  • Error should increase monotonically with grid resolution (higher resolution = harder to compress)

  • Train-test gap within 2x is normal; larger gaps indicate overfitting

  • Visual inspection catches artifacts (smoothing, checkerboards) that REE numbers may not reveal

  • Always combine quantitative metrics with qualitative visual checks