Group Meeting — Discussion Draft

1 / 5

Goal: derive an amino-acid likelihood from UNAAGI

Takeaway: The target is an amino-acid likelihood signal, not direct fitness — the correlation is a hypothesis, not an assumption.
2 / 5

Monte Carlo gives a direct route, but not a satisfying one

Takeaway: Sampling-based estimation is possible in principle, but computationally and statistically weak for amino-acid ranking.
3 / 5

Natural next step: estimate $p(X \mid E)$ directly

Takeaway: IS converts the problem to density evaluation — but requires UNAAGI to support identity conditioning, and still rests on $p(X \mid E)$ being a valid score.
4 / 5

Why DM/FM likelihood may be the wrong scoring object

Probability flow ODE

Convert the diffusion/FM process to a deterministic ODE (Song et al. 2021).
Gives exact log-likelihood via the instantaneous change-of-variables formula.
Expensive: requires a full ODE solve with divergence estimation.

ELBO / variational bound

Sum denoising losses across noise levels.
Gives a lower bound on $\log p(X)$, not the exact value.
Related to — but not identical to — the training loss.

Takeaway: DM/FM training optimises a likelihood surrogate, not exact MLE — accessing $p(X \mid E)$ requires additional computation with different guarantees, and may still be the wrong ranking object.
5 / 5

Reframing UNAAGI: sampler first, scorer separate?

Current E UNAAGI samples + implicit score score mixed with sampler Proposed (aa, E) identity + context Sampler p(X | aa, E) conformers Scorer s(X, E) aa-level score P(aa | E)
Takeaway: A modular sampler-plus-scorer may be cleaner than forcing DM/FM likelihood into a residue-scoring role — but the right scorer is an open question.