DR. MOUNT, JOHN
Topic : MYTHS OF DATA: PRACTICAL ISSUES YOU CAN AND CAN NOT IGNORE.
Abstract: “Modern data science has” become bold “acting as if big data is always an effective tool, no matter how you prepare and process your data.
In this 90-minute workshop we perform a thorough review of the common statistical issues that do and do not remain critical problems when you are data rich. For instance: large data gives the apparent luxury of wishing away some common model eva luation biases, instead Conservative, to work agilely, data scientific must act as if a few “folk axioms” of statistical inference were true, though they are not.
We will cover through lecture and follow-along exercises what commonly goes wrong when data scientists do not understand these issues. We show how to detect and fix ones situations. We start with seemingly operational issues (many variables, “” wide data ” missing effects, large cardinality categoricals, novel categorical levels) and how to correct them in real world data. We work through this and demonstrate that careless correction of these issues can lead to statistically invalid machine learning procedures that give poor results in production. We then introduce automated procedures that are both practical and statistically valid.
Bio: Dr. John Mount is a principal consultant at Win-Vector LLC a San Francisco data science consultancy. John has worked as a computational scientist in biotechnology and a stock-trading algorithm designer and has managed a research team for Shopping.com (now angeray company). John is the coauthor of Practical Data Science with R (Manning Publications, 2014). John started his advanced education in mathematics at UC Berkeley and holds a Ph.D. in computer science from Carnegie Mellon.