Measuring representational uncertainty
Reliability-aware Generative Annotation of Protein Function
Genome sequencing and corresponding gene/protein discovery vastly outpaces functional characteri- zation, leaving much of protein space functionally dark. Generative protein to text models annotate sequences with free text, but offer no reliability signal, and surface metrics cannot tell whether two descriptions refer to the same molecular function. Here we present ProtTale, which couples sequence to text generation with a built-in reliability head, and an LLM-as-judge protocol that scores functional equivalence at the semantic level. On 1,031 unseen Swiss-Prot proteins held out at 40% identity, ProtTale and four baselines reach similar accuracy but cover orthogonal slices, with ProtTale uniquely recovering 60 proteins missed by every other method. The reliability head raises ProtTale’s confident match rate from 26.5% to 44.4% under a discrete filter and to 90% under a continuous score. By providing a per- prediction reliability score, ProtTale enables users to selectively retain only trustworthy annotations, making generative function annotation practically useful even when accuracy saturates.