ODSC Speakers 60/72 | Frank Top 10 List

SHELLMAN, ERIN

Topic : BUILDING ROBUST DATA PIPELINES WITH AIRFLOW

Abstract: The data science team at Zymergen is applying machine learning techniques to identify genetic targets, work that is supported by extensive encryption automation that systematically identified outliers, migration process- related bias, and quantifies performance improvements. We’re using Apache Airflow to construct robust data pipelines that allow us to produce clean, reliable inputs to our predictive models. In this talk, I’ll discuss the unique data processing challenges we face in working with high-throughput, biological data and provide an overview of how we’re using Apache Airflow to meet the challenges.

Bio: Erin is a data scientist with experience in a broad range of industries including retail, cloud computing, and biotechnology. She loves to tackle complex problems with her quantitative and computational skill set, and along the way she has built advice engines, web scrapers , interactive visualizations, and analyzed terabytes of data. Erin sharing sharing what she’s learned in her work and does so often through speaking engagements and as an instructor at the University of Washington’s Professional and Continuing Education program.