ODSC Speakers 15/72

ODSC Speakers 15/72


DEHAN, CHASE

Topic : FEATURE SELECTION FROM HIGH DIMENSIONS

Abstract:  It is a known challenge to select features and in its current state is more of an art than science as the approach can differ depending on the problem the data scientist is looking to solve. While there are methods such as regularization, recursive feature selection , or automated processes like Boruta, these models all perform different depending on the type of algorithm used in the training process. With the rise of larger ensembles using a wide range of inferior models, using a singular feature selection process can lead to an underperforming model . This problem is special pre va lent with the rise of automated machine learning platforms using a variety of base models.

Automated processes like Boruta showed early warrant as they were able to provide superior performance with Random Forests, but has its deficiencies including slow computation time: especially with high dimensional data. Regardless of the run time, Boruta does perform well on Random Forests, poorly on other algorithms such as boosting or neural networks. Differences occur with regularization on LASSO, elastic net, or ridge regressions in that they performed well on linear regressions, but poorly on other other algorithms.

I am proposing and demonstrating a feature selection algorithm in a similar spirit to Boruta utilizing XGBoost as the base model. The algorithm runs in a fraction of the time it takes Boruta and has superior performance on a variety of datasets, including one of nearly twenty-two thousand features. These results hold up across a number of UCI Repository datasets. e v a luation results and timings will be shared along with the underlying code to be later converted into a library posted on CRAN.

Bio: Chase is currently a Data Scientist at Progressive Leasing in Draper, Utah working on variety of cool projects. Prior to the current position, he was an Assistant Professor of Finance and Economics at the University of South Carolina Upstate and holds a BS, MS, and PhD, all in Economics, from the University of Utah.