<strong>Paper Title</strong><br>

A DUAL-MODEL MACHINE LEARNING FRAMEWORK FOR PREDICTING LIPOPHILICITY (LOGP) AND AQUEOUS SOLUBILITY (LOGS) FROM MOLECULAR DESCRIPTORS<br>

<br>


<strong>Abstract</strong><br>

The estimation of key physicochemical properties such as lipophilicity (LogP) and aqueous solubility (LogS) of chemical compounds is very important for pharmaceutical and computational sciences. In this work, a dual model approach employing machine learning algorithms is proposed to predict Log P and Log S using chemical descriptors. Machine learning models to predict Lipophilicity and aqueous solubility are developed using Random Forest, XG Boost, and Light GBM regression algorithms. Both models were created using Mannhold Log P dataset and the ESOL (Delaney) Log S dataset. Evaluation is done via Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-square (R²) metric. Results prove that gradient boosting algorithms are able to handle nonlinear dependencies well, and the best results can be obtained with the Light GBM regressor. Moreover, a process was built for the selection of important descriptors affecting prediction accuracy. A unified pipeline for predicting Log P, Log S, and drug-likeness according to the Lipinski’s rule of five was designed.

Keywords - Machine Learning, Molecular Descriptors, Lipophilicity (LogP), Aqueous Solubility (LogS), Light GBM, Drug-Likeness Prediction