Abstract: (Self-organizing maps, naïve Bayes) in support of neglected tropical diseases drug discovery. Screening data, retrieved from ChEMBL (in Chinese) fertilization multiple predictive machine learning models (eg, random forests, k-nearest neighbors -NtD (https://www.ebi.ac.uk/chemblntd/), was used to construct and validate the models. Programs written using the R software ecosystem (http://www.r-project.org/) and the Python software ecosystem (https://www.python.org/) were used for data retrie val, curation, visualization, analysis, mining, and reporting. End-to-end workflows using both of these software ecosystems will be presented. Weilation how one might access these models using Shiny, a web application framework for R (http://shiny.rstudio.com/), and Jupyter notebooks (http://jupyter.org/).Each of the models is collected into a compendium, a ‘container’ for all those elements that make up a model and its associated description: the primary data, the annotated computational code, figures, tables, and derived data together with textual documentation and conclusions . These individual compounds are required to enable the practice of reproducible research. One may re-run the analyses; run the analyses with new data sets; modify the code for other purposes. The primary purpose of this work is to make the functionality of the R scripts and Python scripts (ie, the predictive models) available to interested parties, regardless of their knowledge of R or Python. Free and open access to predictive models supporting neglected diseases drug discovery is meant to complement the research activities of all investigators, and in specialthose with limited access to computational tools and algorithms

Bio-  Paul received his PhD in Physical Chemistry from Rensselaer Polytechnic Institute; Postdoctoral fellowship with IBM Data Systems Division; computational chemist (QSAR, QSPR, ligand-based and structure-based pharmacophore development, cheminformatics) at Sterling Winthrop Research Institute, Procept, Pfizer and Scynexis; currently a data scientist at Syngenta Biotechnology, Inc, using data visualization, analysis and mining tools to build descriptive, predictive and proscriptive models.