Informative subsampling of big data for statistical inferences
Informative subsampling of big data for statistical inferences
摘 要:
For many tasks of data analysis, a large database of explanatory variables is readily available, while the responses are missing and very expensive to obtain. A natural remedy is to judiciously select a subset of the population for which the measurement of the response is to be obtained. In the machine learning community, this problem termed as active learning has been extensively studied. It has applications in marketing, banking, health care, etc. Recently, the subsampling problem for the statistical inference has been explored by Ouyang, Apley and Mehrotra (2015a, 2015b, 2016). However, the main focus was on the logistic regression for model validation and on the linear model with the interaction term in the study of precision medicine. Here, we propose a generic framework to approximate this N-P hard problem by a continuous problem for which efficient algorithms can be developed. It has three advantages: (i) The information efficiency of the derived subsample can be eva luated without knowing the exact optimal subsample; (ii) It can cope with a very general class of nonlinear models; (iii) The computational cost of the new method is only a small fraction of the existing method.
个人简介:
Wei Zheng, Ph.D., joins the Department of Mathematical Sciences at IUPUI as an assistant professor after completing his Ph.D. in statistics at the University of Illinois, Chicago (UIC).
Zheng’s research has focused on identifying optimal or efficient crossover designs, which are widely used in clinical trials, pharmaceutical studies, psychological experiments, agriculture field trails, animal feeding experiments and many other branches of science. His interest falls in optimal designs for both parameter estimation and hypothesis testing for different types of models both linear and nonlinear.
Zheng’s was working on the limiting distributions of sample covariance of a long memory time series. He will continue to be interested in asymptotic theory for statistics from dependent observations. Using the expertise in both design and time series, he would like to explore areas of adaptive designs where optimal designs depend on the unknown parameter to be estimated, as well as spatio-temporal modeling in topics of image processing, environmental and geographical sciences in which the design aspect has merely been touched.
Dr. Zheng received his B.S. in mathematics specialized in statistics from Zhejiang University in China.
欢迎广大师生参加!
华南统计科学研究中心
2016/7/4