Analyzing Results
Objectives
Interpret Relative Energy Error (REE) values and understand what constitutes good vs. poor reconstruction
Analyze latent-space alignment quality using pairwise metrics and Procrustes diagnostics
Evaluate cross-modal reconstruction fidelity and identify failure modes
Use visualization scripts to build intuition about learned representations
Distinguish between alignment quality issues and reconstruction quality issues
What to expect from a successful experiment
A well-trained pipeline should exhibit:
REE below 0.01 (1%) for same-modality reconstruction (encode and decode the same resolution)
REE below 0.05 (5%) for cross-modal reconstruction (encode one resolution, decode another)
Pairwise latent REE that decreases substantially after Procrustes alignment
Test set performance comparable to validation set (no overfitting)
Scope of Analysis
The analysis addresses three complementary questions:
Are modality-specific latent representations well aligned? (geometric consistency)
Does the joint latent representation preserve sufficient information for reconstruction? (information preservation)
How do errors vary across modalities, resolutions, and data splits? (generalization)
Visualization-only scripts (plot_solutions.py, called in notebook section “3. Plot solutions”) play a supportive role and are not part of the quantitative evaluation.
Latent-Space Alignment Analysis
Pairwise Modality Consistency
Script: analyze_latent_alignment.py
Notebook section: “7. Analyze the alignments”
This analysis operates on second-level modality latents (lat_2_*_ld32.npy) and computes:
Pairwise REE between all modality pairs (raw latent vectors)
REE after centering (mean subtraction) to remove global offset differences
REE after Procrustes alignment to measure best-case geometric agreement
Outputs:
Pairwise REE matrices (CSV) for each alignment type
REE of each modality relative to the canonical mean latent
The canonical joint latent representation (
lat_3_ld32.npy)
Interpreting Alignment Metrics
Observation |
Interpretation |
Action |
|---|---|---|
Low raw REE (<0.01) |
Strong intrinsic alignment |
Latent spaces naturally agree |
High raw REE, low Procrustes REE |
Differences are primarily rotational |
Alignment is working correctly |
High Procrustes REE |
Fundamental structural disagreement |
Check autoencoder training quality |
One modality with much higher REE-to-mean |
Outlier modality |
May need more training epochs or different architecture |
Train REE << Test REE |
Overfitting in alignment |
Reduce model capacity or add regularization |
What Procrustes alignment tells you
Procrustes finds the optimal rotation and reflection to align two point clouds. If the REE drops dramatically after Procrustes, it means the two latent spaces have learned similar structure but different orientation. The alignment network’s job is essentially to learn this rotation – so large Procrustes improvement is expected before alignment training and should diminish after.
Reconstruction and Encoding Errors
Encoding Accuracy
Script: compute_errors.py
Notebook section: “11. Compute errors”
For each modality and data split (train/val/test), the encoding REE measures:
REE_encode = || encoder(x) - z_joint ||^2 / || z_joint ||^2
where z_joint is the canonical joint latent. This quantifies how consistently different encoders map physical inputs to the shared latent space.
Decoding Accuracy
Using the same script, decoding REE is computed:
REE_decode = || decoder(z_joint) - x ||^2 / || x ||^2
This measures how faithfully each decoder reconstructs its modality from the joint latent.
Reading the Error Tables
Results are reported per modality and per split. A typical output looks like:
Modality |
Train REE |
Val REE |
Test REE |
|---|---|---|---|
u_32 |
0.003 |
0.004 |
0.004 |
u_64 |
0.005 |
0.006 |
0.007 |
u_128 |
0.008 |
0.010 |
0.011 |
u_256 |
0.012 |
0.015 |
0.016 |
streamfunc |
0.001 |
0.001 |
0.002 |
What to look for:
Monotonic increase with resolution: Higher-resolution fields are harder to reconstruct, so REE typically increases from u_16 to u_256. This is expected.
Train-test gap: A small gap (within 2x) is normal. A large gap indicates overfitting.
Streamfunction REE: Should be very low since coefficient vectors are already low-dimensional.
What Good Results Look Like
Same-modality reconstruction REE: 0.001 - 0.01 (excellent), 0.01 - 0.05 (acceptable)
Cross-modal reconstruction REE: 0.01 - 0.05 (good), 0.05 - 0.10 (marginal)
Encoding REE: < 0.01 (encoders agree on the joint latent)
What Bad Results Look Like
REE above 0.10 for any modality: the autoencoder is not compressing well enough
Test REE more than 5x train REE: severe overfitting
Cross-modal REE much larger than same-modal REE: alignment failed
One modality with dramatically higher error: that modality’s autoencoder needs retraining
Qualitative Cross-Modality Inspection
Script: plot_modalities.py
Notebook section: “10. Plot modalities”
This script provides visual validation:
Selects a random source modality and sample
Encodes the sample into the joint latent space
Decodes from the joint latent into all target modalities
Displays original fields and reconstructions side-by-side
What to Look for in Plots
Good signs:
Reconstructed solution fields visually match originals at all resolutions
Boundary layers and sharp features are preserved (not smoothed away)
Cross-resolution reconstructions maintain consistent physical structure
Warning signs:
Systematic smoothing: Sharp features (boundary layers, internal layers) are blurred. This suggests the autoencoder latent dimension is too small.
Checkerboard artifacts: Grid-scale oscillations in reconstructions. This suggests the decoder architecture has issues (common with transposed convolutions).
Resolution-dependent bias: Low-resolution reconstructions look fine but high-resolution ones fail. This suggests the high-resolution autoencoder needs more capacity.
Modality confusion: Reconstructed streamfunction coefficients do not correspond to the velocity field seen in the solution. This indicates an alignment failure.
Supporting Visualization
Script: plot_solutions.py
Notebook section: “3. Plot solutions”
Visualizes raw PDE solutions at multiple spatial resolutions. Used to:
Build intuition about how solutions vary across the parameter space
Provide reference fields for visual comparison with reconstructions
Observe how advection-dominated solutions differ from diffusion-dominated ones
No quantitative metrics are derived from this script.
Latent Space Visualization Tips
While the latent space is typically 32-dimensional, useful low-dimensional projections include:
PCA (first 2-3 components): Shows the dominant variation directions. If modality latents cluster by modality rather than by sample, alignment has failed.
t-SNE or UMAP: Shows local neighborhood structure. Aligned modalities should intermingle; unaligned ones form separate clusters.
Pairwise scatter plots: Plot latent dimension i vs. dimension j for different modalities. Aligned modalities should show correlated clouds.
Visualization is not evaluation
Low-dimensional projections can be misleading. Two latent spaces that look overlapping in a PCA projection may still have high REE in the full 32-dimensional space. Always use quantitative metrics (REE tables) as the primary evaluation and visualization as a supplementary check.
Result Artifacts
Generated analysis artifacts include:
Artifact |
Location |
Description |
|---|---|---|
Pairwise REE matrices |
|
Raw, centered, and Procrustes-aligned REE between all modality pairs |
Error summaries |
|
Per-modality, per-split encoding and decoding REE |
Joint latent |
|
Canonical joint latent representation |
Reconstruction plots |
|
Cross-modal reconstruction visualizations |
Solution visualizations |
|
Raw PDE solutions at multiple resolutions |
Summary
The analysis framework combines three complementary approaches:
Geometric latent-space diagnostics – Are the latent representations structurally aligned across modalities?
Quantitative reconstruction error metrics – Can the original data be faithfully recovered from the joint latent?
Qualitative cross-modality visual inspection – Do reconstructions look physically reasonable?
Together, these analyses validate whether the learned joint latent space provides a consistent, resolution-agnostic representation of PDE solutions suitable for downstream tasks such as interpolation, comparison, and transfer learning.
Keypoints
REE below 0.01 (1%) indicates good same-modality reconstruction; below 0.05 (5%) for cross-modal
Procrustes REE improvement indicates rotational differences (expected; alignment corrects these)
Error should increase monotonically with grid resolution (higher resolution = harder to compress)
Train-test gap within 2x is normal; larger gaps indicate overfitting
Visual inspection catches artifacts (smoothing, checkerboards) that REE numbers may not reveal
Always combine quantitative metrics with qualitative visual checks