Paper Title
Adopting Protein Language Models to Identify and Analyze Protein-Protein Interactions in Anemia
Abstract
Understanding the molecular mechanisms under- lying disorders like anemia requires analyzing protein-protein interactions (PPIs), which are crucial for identifying potential biomarkers and treatments in biomedical research. Recently, transformer-based deep learning models, known for capturing complex relationships in sequential data, have shown promise in biological sequence analysis. In this study, we evaluate three transformer based models Prot BERT, Distil BERT, and ELECTRA for predicting anemia-related PPIs, using the STRING database as our primary dataset. We optimized these models for binary classification to distinguish between anemia-linked and unrelated PPIs. Our results indicate that Distil BERT achieved the highest accuracy at 82%, while ELECTRA and Prot BERT had accuracy rates of 58.4% and 55%, respectively. ELECTRA also showed a 77% average out-of-sample accuracy. These findings highlight the potential of deep learning models in disease-specific PPI prediction and emphasize the need for further research to improve feature engineering and model optimization for more reliable biomedical applications
Keywords - Protein-Protein Interactions, Anemia, Transformer Models, Deep Learning, Biomedical Applications, Machine Learning, Prot BERT, Distil BERT, ELECTRA