Deriving Amino-Acid Likelihood from UNAAGI

1 / 5

Goal: derive an amino-acid likelihood from UNAAGI

UNAAGI is an atom-level generative model over side-chain structures, operating in a unified NAA/NCAA space
We want a residue-level score for amino acid $\mathrm{aa}$ given pocket context $E$
The natural target quantity is the marginal probability over conformer space:
$$P(\mathrm{aa} \mid E) \;=\; \int \mathbf{1}\bigl[\mathrm{aa}(X) = \mathrm{aa}\bigr]\; p(X \mid E)\; \mathrm{d}X$$
where $\mathrm{aa}(X)$ maps a conformer to its discrete chemical identity — well-defined for standard amino acids, worth care for NCAAs
"Better" means higher likelihood under the training data distribution — not direct fitness
Working hypothesis: this signal may correlate with mutational effect, analogous to MSA likelihoods or inverse-folding model scores

Takeaway: The target is an amino-acid likelihood signal, not direct fitness — the correlation is a hypothesis, not an assumption.

2 / 5

Monte Carlo gives a direct route, but not a satisfying one

A direct estimate of $P(\mathrm{aa} \mid E)$ is available via sampling:
$$\hat{P}(\mathrm{aa} \mid E) \;\approx\; \frac{1}{N} \sum_{i=1}^{N} \mathbf{1}\bigl[\mathrm{aa}(X^{(i)}) = \mathrm{aa}\bigr], \quad X^{(i)} \sim p(\cdot \mid E)$$
Problems with this approach:
- Expensive — requires many forward passes per context $E$
- High variance — rare amino acids need far more samples to estimate reliably
- Poor coverage of rare events — tail amino acids may never be sampled
Ranking many candidate amino acids compounds all of these issues simultaneously

Takeaway: Sampling-based estimation is possible in principle, but computationally and statistically weak for amino-acid ranking.

3 / 5

Natural next step: estimate $p(X \mid E)$ directly

If $p(X \mid E)$ were reliable, importance sampling turns the integral into an expectation:
$$P(\mathrm{aa} \mid E) \;\approx\; \mathbb{E}_{X \sim q_\mathrm{aa}}\!\left[\frac{p(X \mid E)}{q_\mathrm{aa}(X)}\right]$$
where $q_\mathrm{aa}(X) = p(X \mid \mathrm{aa}, E)$ is UNAAGI run with identity fixed to $\mathrm{aa}$ — sample conformers from the identity-conditioned model, reweight by the ratio to the unconditional density
The ratio $p(X \mid E)\,/\,p(X \mid \mathrm{aa}, E)$ measures how much more or less likely a conformer is once identity is revealed
This requires UNAAGI to be operable in a conditioned-on-identity mode — a non-trivial assumption worth flagging
The bottleneck shifts: the question is no longer how to sample, but whether $p(X \mid E)$ is a meaningful and accessible score

Takeaway: IS converts the problem to density evaluation — but requires UNAAGI to support identity conditioning, and still rests on $p(X \mid E)$ being a valid score.

4 / 5

Why DM/FM likelihood may be the wrong scoring object

DM/FM training optimises a surrogate of data likelihood, not the exact log-likelihood directly:
- Diffusion: denoising score-matching loss $\equiv$ reweighted ELBO — a lower bound on $\log p(X)$, not exact MLE
- Flow matching: vector-field regression — matched to a transport process, no direct likelihood objective
Accessing $p(X \mid E)$ after training requires additional computation — two routes, with different guarantees:

Probability flow ODE

Convert the diffusion/FM process to a deterministic ODE (Song et al. 2021).
Gives exact log-likelihood via the instantaneous change-of-variables formula.
Expensive: requires a full ODE solve with divergence estimation.

ELBO / variational bound

Sum denoising losses across noise levels.
Gives a lower bound on $\log p(X)$, not the exact value.
Related to — but not identical to — the training loss.

Contrast: AR and normalizing flows optimise $\log p(X \mid E)$ exactly as the training objective — likelihood is native, not derived
Hypothesis: even if $p(X \mid E)$ is numerically accessible via the ODE route, it may still be a weak or unstable residue-ranking signal — the model was shaped by a surrogate, not by the quantity we want to use

Takeaway: DM/FM training optimises a likelihood surrogate, not exact MLE — accessing $p(X \mid E)$ requires additional computation with different guarantees, and may still be the wrong ranking object.

5 / 5

Reframing UNAAGI: sampler first, scorer separate?

Hypothesis: DM/FM models may be strong samplers of plausible conformers, but weak native scorers of residue identity
Proposed decomposition:

Open questions:
- Should $s(X, E)$ be a physics-based scorer (e.g. Rosetta REF2015), a learned model, or a hybrid?
- Aggregate over conformers by sum / mean / max, or score only the top-$k$?
- What training signal should supervise the scorer — experimental $\Delta\Delta G$, DMS, or structure quality?

Takeaway: A modular sampler-plus-scorer may be cleaner than forcing DM/FM likelihood into a residue-scoring role — but the right scorer is an open question.