Hello everyone, today I would like to share an article from Nature Biotechnology titled ‘Multistate and functional protein design using RoseTTAFold sequence space diffusion.’ The corresponding author is Professor David Baker from the University of Washington. His research focuses on protein design and related fields.
The function of proteins is determined by their sequence and structural features. Designing new proteins requires simultaneous consideration of both sequence and structure. Current protein design methods typically involve generating a protein scaffold followed by reverse design to create the sequence. The authors believe that designing at the sequence level is equally important, and the method developed in this paper enables protein design at the sequence level.
The authors developed the ProteinGenerator (PG) based on RoseTTAFold, which utilizes a diffusion strategy to map protein sequences into a matrix of length × 20. It allows for fixing certain motifs to not undergo design, then gradually adding Gaussian noise, and training the model to recover the original sequence from the noise signal. Additionally, secondary structure constraints are also fed into the model. After training, it was found that the protein sequences generated by PG closely matched the expected structures predicted by AlphaFold2. By introducing the required features of the target protein, such as the necessary amino acids, special proteins can be designed. For example, the authors designed proteins with a higher proportion of rare amino acids (tryptophan, cysteine, valine, histidine, methionine), which were confirmed through expression and structural validation to match the expected structures and properties during design. Proteins designed for specific physical properties (charge composition, hydrophobicity, etc.) were also successful.
The authors also utilized PG to design physiologically active proteins. For instance, by fixing the enzyme cleavage sequence and the bioactive sequence parts, and then using PG to design the remaining parts, different secondary structures can be designed to obtain proteins that are either difficult or easy to cleave to release active peptide segments. Additionally, the authors designed proteins with different states, such as those having an overall alpha-beta structure, but both parts released after enzyme cleavage being alpha-helical proteins.
Authors: JGG
Editor: ZF
Article link: https://doi.org/10.1038/s41587-024-02395-w
Article citation: 10.1038/s41587-024-02395-w