Enhancing PiFold: A new Deep Learning method for Inverse Protein Folding
2024
Algorta Bove, Joaquín | Chacón Montes, Pablo | Garrido Arandia, María
Computational protein design aims to create new protein structures that nature has not yet produced to uncover improved properties or entirely new functionalities. The Inverse Folding Problem is a crucial challenge in protein design whose objective is to predict a protein sequence that folds into a given structure. PiFold is a novel deep learning model for inverse folding that introduces a novel residue featurizer and Physics-informed Graph Neural Network layers to learn expressive residue representations in a one-shot manner. The aim of this study is to optimize PiFold in order to enhance its predictive power and evaluate its performance relative to cutting-edge methodologies. For PiFold optimization, the simplification of the featurizer module was performed that resulted in a 2% increase in both recovery and NSSR. Further implementations were tried: the addition of residue SASA as a feature of PiFold did not result in a significant enhancement, class weighting managed to mitigate frequency deviations between true and predicted aminoacids with a 1% decrease in recovery, excluding membrane proteins of the training set resulted in a 1% improvement in sequence similarity metrics and applying ProRefiner algorithm to refine sequence predictions increased total recovery by 1%. After an extensive evaluation of the state of the art inverse folding methods on CATH4.2 dataset, results showed that the optimized PiFold model outperforms ProteinMPNN, the reference inverse folding algorithm, and obtained similar performance scores than SPIN-CGNN regarding sequence similarity metrics. Besides, PiFold’s structural predictions of designed sequences showed better results than ProteinMPNN on CATH4.2 test set and similar results than SPIN-CGNN on Hallucination129 test set, indicating top results in both real and artificial protein structures. Finally, based on a described choline binding motif, the in silico design of a protein with 3 choline binding motifs was carried out using RFdiffusion to design the protein backbone and PiFold to predict the optimal sequence. PiFold provided a better outcome than using ProteinMPNN, validating its utility for de novo protein design. The enhancements and extensive evaluation of PiFold demonstrate its potential as a powerful
Show more [+] Less [-]AGROVOC Keywords
Bibliographic information
This bibliographic record has been provided by Universidad Politécnica de Madrid