A novel sequence-based prediction method for ATP-binding sites using fusion of SMOTE algorithm and random forests classifier
2020
Song, Jiazhi | Liu, Guixia | Song, Chuyi | Jiang, Jingqing
Correctly identifying the protein-ATP binding site is valuable for both protein function annotation and new drug discovery. However, the number of non-ATP-binding residues is much more than the number of ATP-binding residues, which makes the prediction a classical imbalanced learning problem. Previous studies often apply the under-sampling technique to construct a relatively balanced dataset, but some information is inevitably lost during the sample process. In this work, we utilize the SMOTE algorithm, which generates the balanced dataset by generating ATP-binding sites with the idea of interpolation. The Random Forest is selected as classifier to ensure the acceptable training speed. With the combination of complementary template-based method, the prediction performance of the proposed method is further improved. After comparing with other sequence-based predictors, our proposed method achieves satisfying performance and proved to be efficient for ATP-binding sites prediction.
Afficher plus [+] Moins [-]Mots clés AGROVOC
Informations bibliographiques
Cette notice bibliographique a été fournie par National Agricultural Library
Découvrez la collection de ce fournisseur de données dans AGRIS