ODSC Speakers 54/72

ODSC Speakers 54/72



Abstract:  As R developers, our first instinct may be to approach databases the same way we do regular files. We start by reading all of the data into memory and then proceed to data exploration. But what if there is a better way.

The so that it can be used in succession and interchangeably to gain understanding of the data iteratively. Another nice thing about dplyr is that it can interact with databases directly. It is a accomplishes this by translating the dplyr verbs into SQL queries. This incredibly enabled feature allows us to ‘speak’ directly with the database from R:

– Run the data exploration over all of the data – Instead of coming up with a plan to decide what data to import, we can focus on analysis inside the database, which in turn should yield narrow insights.

– Use the SQL Engine to run the data transformations – We are, in effect, pushing the computation to the database because dplyr is sending SQL queries to the database.

– Collect a targeted dataset – After become familiar with the data and choosing the data points that will either be shared or modeled, a final query can then be used to bring back only that data into memory in R.

– all your code is in R! – because we are using dplyr to communicate with the database, there is no need to change language, or tools, to perform the data exploration.

Bio:  Edgar has a background in deploying business reporting and Business Intelligence solutions. He has posted multiple articles and blog posts sharing analytics insights and server infrastructure for Data Science.