Better AI For Understanding Life on Earth: Predict First, Design Later

Abstract

Generative AI is generating much enthusiasm on potentially advancing biological design in computational biology. In this paper we take a somewhat contrarian view, arguing that a broader and deeper understanding of existing biological sequences is essential before undertaking the design of novel ones. We draw attention, for instance, to current protein function prediction methods which currently face significant limitations due to incomplete data and inherent challenges in defining and measuring function. We propose a “blue sky” vision centered on both comprehensive and precise annotation of existing protein and DNA sequences, aiming to develop a more complete and precise understanding of biological function. By contrasting recent studies that leverage generative AI for biological design with the pressing need for enhanced data annotation, we underscore the importance of prioritizing robust predictive models over premature generative efforts. We advocate for a strategic shift toward thorough sequence annotation and predictive understanding, laying a solid foundation for future advances in biological design.

Publication
Proceedings of the 2025 SIAM International Conference on Data Mining (SDM)
Yana Bromberg
Yana Bromberg
Principal Investigator - Professor of Bioinformatics

My research focuses on deciphering the DNA blueprints of life’s molecular machinery

Related