A high-level programming language for generative biology with Proto | bioRxiv
Stanford University brings with Proto orchestration to AI biology workflows → Brian Hie’s team released Proto, an open framework for composing AI biology models across DNA, RNA, proteins and ligands into one pipeline.The paper introduces Proto, a high-level, open-source programming language designed for generative biology. While traditional biological engineering relies on mixing and matching literal sequence parts from nature (often through trial-and-error), Proto establishes modularity at a functional and semantic level. It leverages generative AI models to bridge the gap between high-level functional intent and low-level biological sequences.
The Four Primitives
Proto unifies diverse biological AI models by breaking down design tasks into four core abstractions:
Sequences: Typed variables representing physical strings of DNA, RNA, proteins, or ligands.
Constraints: Scoring functions (ranging from basic stats like GC content to complex networks like AlphaFold) that evaluate a sequence's desirability.
Generators: Procedures that propose candidate sequences (such as language models or diffusion models).
Optimizers: Iterative loops (like MCMC or gradient descent) that guide the generator toward minimizing constraint scores.
Key Achievements & Validation
The authors demonstrate Proto's flexibility across multiple modalities and scales:
Recapitulating Past Campaigns: They successfully reproduced diverse literature designs in silico, including symmetric protein homo-oligomers, de novo protein monomers, multi-modal CRISPR-Cas systems, 20-kb chromatin accessibility tracks, and antibody CDR designs.
Experimental Validation:
RNA Introns: Designed alternatively spliced introns tailored to specific human cell lines, validated experimentally in cell cultures.
Promoter-Repressor Pairs: Achieved leading experimental success rates for synthetic protein-DNA design.
AI Agent Integration: By coupling Proto with general-purpose AI coding agents, they showed it can generate complex biological designs (like cancer-targeting therapies and multi-step pathways) directly from natural language instructions.
Ultimately, Proto aims to make generative biological design highly structured, scalable, and accessible to researchers across varying levels of computational expertise.
No comments:
Post a Comment