Rodrigo Quiroga* and Marcos Villarreal Pages 1 - 13 ( 13 )
Structure-based drug discovery methods, such as molecular docking and virtual screening, have become invaluable tools in developing novel drugs. At the core of these methods are Scoring Functions (SFs), which predict the binding affinity between ligands and protein targets. This study aims to review and contextualize the challenges and best practices in training novel scoring functions to improve their accuracy and generalizability in predicting protein-ligand binding affinities. Effective training of scoring functions requires careful attention to the quality of training data and methodologies. We emphasize the need for robust training strategies to produce consistent and generalizable SFs. Key considerations include addressing hidden biases and overfitting in machine-learning models, as well as ensuring the use of high-quality, unbiased datasets for both training and evaluation of SFs. Innovative hybrid methods, combining the advantages of empirical and machine-learning approaches, hold promise for outperforming current scoring functions while displaying greater generalizability and versatility.
Molecular docking, scoring function, computational drug discovery, virtual screening, machine learning, deep learning.