Using knockoffs for controlled predictive biomarker identification


One of the key challenges of personalized medicine is to identify which patients will respond positively to a given treatment. The area of subgroup identification focuses on this challenge, that is, identifying groups of patients that experience desirable characteristics, such as an enhanced treatment effect. A crucial first step towards the subgroup identification is to identify the baseline variables (eg, biomarkers) that influence the treatment effect, which are known as predictive variables. Many subgroup discovery algorithms return importance scores that capture the variables’ predictive strength. However, a major limitation of these scores is that they do not answer the core question: “Which variables are actually predictive?” With our work we answer this question by using the knockoff framework, which is a general framework for controlling the false discovery rate when performing prognostic variable selection. In contrast, our work is the first that uses knockoffs for predictive variable selection. We introduce two novel knockoff filters: one parametric, building on variable importance scores derived from a penalized linear regression model, and one non-parametric, building on causal forest variable importance scores. We conduct extensive simulations to validate performance of the proposed methodology and we also apply the proposed methods to data from a randomized clinical trial.

Statistics in Medicine, volume 40(25), pages 5453–5473.