# Analyzing Results

```{objectives}
- Interpret Relative Energy Error (REE) values and understand what constitutes good vs. poor reconstruction
- Analyze latent-space alignment quality using pairwise metrics and Procrustes diagnostics
- Evaluate cross-modal reconstruction fidelity and identify failure modes
- Use visualization scripts to build intuition about learned representations
- Distinguish between alignment quality issues and reconstruction quality issues
```

```{admonition} What to expect from a successful experiment
:class: tip

A well-trained pipeline should exhibit:
- **REE below 0.01 (1%)** for same-modality reconstruction (encode and decode the same resolution)
- **REE below 0.05 (5%)** for cross-modal reconstruction (encode one resolution, decode another)
- **Pairwise latent REE** that decreases substantially after Procrustes alignment
- **Test set performance** comparable to validation set (no overfitting)
```

## Scope of Analysis

The analysis addresses three complementary questions:

1. **Are modality-specific latent representations well aligned?** (geometric consistency)
2. **Does the joint latent representation preserve sufficient information for reconstruction?** (information preservation)
3. **How do errors vary across modalities, resolutions, and data splits?** (generalization)

Visualization-only scripts (`plot_solutions.py`, called in notebook section "3. Plot solutions") play a supportive role and are not part of the quantitative evaluation.

---

## Latent-Space Alignment Analysis

### Pairwise Modality Consistency

**Script:** `analyze_latent_alignment.py`
**Notebook section:** "7. Analyze the alignments"

This analysis operates on second-level modality latents (`lat_2_*_ld32.npy`) and computes:

- **Pairwise REE** between all modality pairs (raw latent vectors)
- **REE after centering** (mean subtraction) to remove global offset differences
- **REE after Procrustes alignment** to measure best-case geometric agreement

**Outputs:**

- Pairwise REE matrices (CSV) for each alignment type
- REE of each modality relative to the canonical mean latent
- The canonical joint latent representation (`lat_3_ld32.npy`)

### Interpreting Alignment Metrics

| Observation | Interpretation | Action |
|-------------|---------------|--------|
| Low raw REE (<0.01) | Strong intrinsic alignment | Latent spaces naturally agree |
| High raw REE, low Procrustes REE | Differences are primarily rotational | Alignment is working correctly |
| High Procrustes REE | Fundamental structural disagreement | Check autoencoder training quality |
| One modality with much higher REE-to-mean | Outlier modality | May need more training epochs or different architecture |
| Train REE << Test REE | Overfitting in alignment | Reduce model capacity or add regularization |

```{admonition} What Procrustes alignment tells you
:class: note

Procrustes finds the optimal rotation and reflection to align two point clouds. If the REE drops dramatically after Procrustes, it means the two latent spaces have learned similar *structure* but different *orientation*. The alignment network's job is essentially to learn this rotation -- so large Procrustes improvement is expected before alignment training and should diminish after.
```

---

## Reconstruction and Encoding Errors

### Encoding Accuracy

**Script:** `compute_errors.py`
**Notebook section:** "11. Compute errors"

For each modality and data split (train/val/test), the encoding REE measures:

```
REE_encode = || encoder(x) - z_joint ||^2 / || z_joint ||^2
```

where `z_joint` is the canonical joint latent. This quantifies how consistently different encoders map physical inputs to the shared latent space.

### Decoding Accuracy

Using the same script, decoding REE is computed:

```
REE_decode = || decoder(z_joint) - x ||^2 / || x ||^2
```

This measures how faithfully each decoder reconstructs its modality from the joint latent.

### Reading the Error Tables

Results are reported per modality and per split. A typical output looks like:

| Modality | Train REE | Val REE | Test REE |
|----------|-----------|---------|----------|
| u_32 | 0.003 | 0.004 | 0.004 |
| u_64 | 0.005 | 0.006 | 0.007 |
| u_128 | 0.008 | 0.010 | 0.011 |
| u_256 | 0.012 | 0.015 | 0.016 |
| streamfunc | 0.001 | 0.001 | 0.002 |

**What to look for:**

- **Monotonic increase with resolution**: Higher-resolution fields are harder to reconstruct, so REE typically increases from u_16 to u_256. This is expected.
- **Train-test gap**: A small gap (within 2x) is normal. A large gap indicates overfitting.
- **Streamfunction REE**: Should be very low since coefficient vectors are already low-dimensional.

### What Good Results Look Like

- Same-modality reconstruction REE: **0.001 - 0.01** (excellent), **0.01 - 0.05** (acceptable)
- Cross-modal reconstruction REE: **0.01 - 0.05** (good), **0.05 - 0.10** (marginal)
- Encoding REE: **< 0.01** (encoders agree on the joint latent)

### What Bad Results Look Like

- REE above 0.10 for any modality: the autoencoder is not compressing well enough
- Test REE more than 5x train REE: severe overfitting
- Cross-modal REE much larger than same-modal REE: alignment failed
- One modality with dramatically higher error: that modality's autoencoder needs retraining

---

## Qualitative Cross-Modality Inspection

**Script:** `plot_modalities.py`
**Notebook section:** "10. Plot modalities"

This script provides visual validation:

1. Selects a random source modality and sample
2. Encodes the sample into the joint latent space
3. Decodes from the joint latent into **all** target modalities
4. Displays original fields and reconstructions side-by-side

### What to Look for in Plots

**Good signs:**

- Reconstructed solution fields visually match originals at all resolutions
- Boundary layers and sharp features are preserved (not smoothed away)
- Cross-resolution reconstructions maintain consistent physical structure

**Warning signs:**

- **Systematic smoothing**: Sharp features (boundary layers, internal layers) are blurred. This suggests the autoencoder latent dimension is too small.
- **Checkerboard artifacts**: Grid-scale oscillations in reconstructions. This suggests the decoder architecture has issues (common with transposed convolutions).
- **Resolution-dependent bias**: Low-resolution reconstructions look fine but high-resolution ones fail. This suggests the high-resolution autoencoder needs more capacity.
- **Modality confusion**: Reconstructed streamfunction coefficients do not correspond to the velocity field seen in the solution. This indicates an alignment failure.

---

## Supporting Visualization

**Script:** `plot_solutions.py`
**Notebook section:** "3. Plot solutions"

Visualizes raw PDE solutions at multiple spatial resolutions. Used to:

- Build intuition about how solutions vary across the parameter space
- Provide reference fields for visual comparison with reconstructions
- Observe how advection-dominated solutions differ from diffusion-dominated ones

No quantitative metrics are derived from this script.

---

## Latent Space Visualization Tips

While the latent space is typically 32-dimensional, useful low-dimensional projections include:

- **PCA (first 2-3 components)**: Shows the dominant variation directions. If modality latents cluster by modality rather than by sample, alignment has failed.
- **t-SNE or UMAP**: Shows local neighborhood structure. Aligned modalities should intermingle; unaligned ones form separate clusters.
- **Pairwise scatter plots**: Plot latent dimension *i* vs. dimension *j* for different modalities. Aligned modalities should show correlated clouds.

```{admonition} Visualization is not evaluation
:class: warning

Low-dimensional projections can be misleading. Two latent spaces that look overlapping in a PCA projection may still have high REE in the full 32-dimensional space. Always use quantitative metrics (REE tables) as the primary evaluation and visualization as a supplementary check.
```

---

## Result Artifacts

Generated analysis artifacts include:

| Artifact | Location | Description |
|----------|----------|-------------|
| Pairwise REE matrices | `results/*.csv` | Raw, centered, and Procrustes-aligned REE between all modality pairs |
| Error summaries | `results/*.csv` | Per-modality, per-split encoding and decoding REE |
| Joint latent | `latents/lat_3_ld32.npy` | Canonical joint latent representation |
| Reconstruction plots | `results/*.png` | Cross-modal reconstruction visualizations |
| Solution visualizations | `results/*.png` | Raw PDE solutions at multiple resolutions |

---

## Summary

The analysis framework combines three complementary approaches:

1. **Geometric latent-space diagnostics** -- Are the latent representations structurally aligned across modalities?
2. **Quantitative reconstruction error metrics** -- Can the original data be faithfully recovered from the joint latent?
3. **Qualitative cross-modality visual inspection** -- Do reconstructions look physically reasonable?

Together, these analyses validate whether the learned joint latent space provides a consistent, resolution-agnostic representation of PDE solutions suitable for downstream tasks such as interpolation, comparison, and transfer learning.

```{keypoints}
- REE below 0.01 (1%) indicates good same-modality reconstruction; below 0.05 (5%) for cross-modal
- Procrustes REE improvement indicates rotational differences (expected; alignment corrects these)
- Error should increase monotonically with grid resolution (higher resolution = harder to compress)
- Train-test gap within 2x is normal; larger gaps indicate overfitting
- Visual inspection catches artifacts (smoothing, checkerboards) that REE numbers may not reveal
- Always combine quantitative metrics with qualitative visual checks
```