首页 | 用户登录 | 旧版网站 | 加入收藏

合作交流

您当前的位置>合作交流>学术讲座

详细信息

题 目:Informative subsampling of big data for statistical inferences
讲座时间:2016/7/5 0:00:00
主持人:郑伟助理教授
单位:Indiana University – Purdue University Indianapolis
地点:新数学楼415讲学室 2016年7月5日(周二) 上午9:00-10:00
摘要:For many tasks of data analysis, a large database of explanatory variables is readily available, while the responses are missing and very expensive to obtain. A natural remedy is to judiciously select a subset of the population for which the measurement of the response is to be obtained. In the machine learning community, this problem termed as active learning has been extensively studied. It has applications in marketing, banking, health care, etc. Recently, the subsampling problem for the statistical inference has been explored by Ouyang, Apley and Mehrotra (2015a, 2015b, 2016). However, the main focus was on the logistic regression for model validation and on the linear model with the interaction term in the study of precision medicine. Here, we propose a generic framework to approximate this N-P hard problem by a continuous problem for which efficient algorithms can be developed. It has three advantages: (i) The information efficiency of the derived subsample can be evaluated without knowing the exact optimal subsample; (ii) It can cope with a very general class of nonlinear models; (iii) The computational cost of the new method is only a small fraction of the existing method.