How to make model-free feature screening approaches for full data applicable to missing response case?
How to make model-free feature screening approaches for full data applicable to missing response case?
报告摘要:It is quite challenge to develop model-free feature screening approaches for missing response problems since the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops a novel technique by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh-dimensional covariates with full data can be applied to missing response case. This technique is developed by proving that the set of the active predictors on the response is a subset of the active predictors on the product of the response and missingness indicator. Then, any standard model-free feature screening procedures with screening property for full data can be applied to estimating the latter one.Hence, the probability that the estimated set contains the set of the latter one and hence the previous one tends to one. It is shown that the complete case (CC) approach can also keep the feature screening property of any feature screening approach with feature screening property for full data. As an alternative, a two-step approach is also developed for obtaining a feature screening estimator of the active predictor set of interest. A simulation study was conducted to compare the proposed methods with the ``complete case" (CC) approach. Real data analysis was used to illustrate the proposed method. Both the simulation studies and real data analysis indicate that the proposed zero imputation feature screening method outperforms the CC method and the two step one.