ODSC Speakers 41/72

ODSC Speakers 41/72



Bio:  I am at ICSI and the Department of Statistics at UC Berkeley, and I am also in theRISELab (in the past AMPLab) in the Department of Computer Science.

Most of my work focus on the theory and practice of what is now called big data, although I was doing it back when it was just massive, and prior to that when it was just large. On the theory side, we develop algorithms and statistical method for matrix, graph, regression, optimization and related problems. On the practice side, we provide practices (eg, on single machine, distributed data system, and supercomputer environments), and we apply apply methods to a range of problems in internet and social media analysis, social networks analysis, as well as genetics, mass spec imaging, astronomy, climate, and a range of other scientific applications.

Research interests

  • Algorithmic and statistical aspects of modern large-scale data analysis.
  • Randomized linear algebra and randomized numerical linear algebra
  • Implicit regularization and implicit optimization methods in scalable approximation algorithms.
  • Graph approximation algorithms and applications to large social and information networks.
  • Applications to DNA microarray, SNP, astronomical, medical imaging, and other scientific data.

A lot of my work has focused on Randomized Linear Algebra, i.e., using random sampling and random projection methods to solve very large matrix-based problems; developing geometric network analysis tools, i.e., using scalable approximation algorithms with a geometric or statistical flavor to analyze the structure and dynamics of large informatics graphs; developing approximate computation and regularization methods for large informatics graphs; applications to community detection, clustering, and information dynamics in large social and information networks; and applications to DNA single nucleotide polymorphism (SNP) data,astronomical and medical imaging data, and large-scale statistical data analysis more generally.

In the past, I developed and analyzed algorithms for large matrix, graph, and regression problems, and I applied these and related tools to the statistical data analysis of extremely large scientific and Internet data sets. For example, I worked on large-scale web analytics, machine learning, and query log analysis; applications of graph partitioning algorithms to clustering and community identification; and applications of randomized matrix algorithms to hyperspectral medical image data, DNA microarray data, and DNA SNP data.

In the more distant past, I have also worked on developing and analyze Monte Carlo algorithms for performing computations on extremely large matrices, eg, the additive-error and relative-error CUR texture decompositions. Past research has also reviewed ed work in computational statistics mechanics on the development and analysis of the TIP5P model of liquid water, as well as work in both computational and experimental biophysics on proteins and protein-nucleic acid interactions.