where $q_\mathrm{aa}(X) = p(X \mid \mathrm{aa}, E)$ is UNAAGI run with identity fixed to $\mathrm{aa}$ — sample conformers from the identity-conditioned model, reweight by the ratio to the unconditional density
The ratio $p(X \mid E)\,/\,p(X \mid \mathrm{aa}, E)$ measures how much more or less likely a conformer is once identity is revealed
This requires UNAAGI to be operable in a conditioned-on-identity mode — a non-trivial assumption worth flagging
The bottleneck shifts: the question is no longer how to sample, but whether $p(X \mid E)$ is a meaningful and accessible score
Takeaway: IS converts the problem to density evaluation — but requires UNAAGI to support identity conditioning, and still rests on $p(X \mid E)$ being a valid score.
4 / 5
Why DM/FM likelihood may be the wrong scoring object
DM/FM training optimises a surrogate of data likelihood, not the exact log-likelihood directly:
Diffusion: denoising score-matching loss $\equiv$ reweighted ELBO — a lower bound on $\log p(X)$, not exact MLE
Flow matching: vector-field regression — matched to a transport process, no direct likelihood objective
Accessing $p(X \mid E)$ after training requires additional computation — two routes, with different guarantees:
Probability flow ODE
Convert the diffusion/FM process to a deterministic ODE (Song et al. 2021).
Gives exact log-likelihood via the instantaneous change-of-variables formula.
Expensive: requires a full ODE solve with divergence estimation.
ELBO / variational bound
Sum denoising losses across noise levels.
Gives a lower bound on $\log p(X)$, not the exact value.
Related to — but not identical to — the training loss.
Contrast: AR and normalizing flows optimise $\log p(X \mid E)$ exactly as the training objective — likelihood is native, not derived
Hypothesis: even if $p(X \mid E)$ is numerically accessible via the ODE route, it may still be a weak or unstable residue-ranking signal — the model was shaped by a surrogate, not by the quantity we want to use
Takeaway: DM/FM training optimises a likelihood surrogate, not exact MLE — accessing $p(X \mid E)$ requires additional computation with different guarantees, and may still be the wrong ranking object.
5 / 5
Reframing UNAAGI: sampler first, scorer separate?
Hypothesis: DM/FM models may be strong samplers of plausible conformers, but weak native scorers of residue identity
Proposed decomposition:
Open questions:
Should $s(X, E)$ be a physics-based scorer (e.g. Rosetta REF2015), a learned model, or a hybrid?
Aggregate over conformers by sum / mean / max, or score only the top-$k$?
What training signal should supervise the scorer — experimental $\Delta\Delta G$, DMS, or structure quality?
Takeaway: A modular sampler-plus-scorer may be cleaner than forcing DM/FM likelihood into a residue-scoring role — but the right scorer is an open question.