In an era in which the volume and complexity of datasets is continuously growing, feature selection techniques have become indispensable to extract useful information from huge amounts of data. However, existing algorithms may not scale well when …
A key challenge in information theoretic feature selection is to estimate mutual information expressions that capture three desirable terms: the relevancy of a feature with the output, the redundancy and the complementarity between groups of …
Deep learning systems can be fooled by small, worst-case perturbations of their inputs, known as adversarial examples. This has been almost exclusively studied in supervised learning, on vision tasks. However, adversarial examples in counterfactual …
The identification of biomarkers to support decision-making is central to personalized medicine, in both clinical and research scenarios. The challenge can be seen in two halves: identifying predictive markers, which guide the development/use of …
Feature Selection is central to modern data science, from exploratory data analysis to predictive model-building. The stability of a feature selection algorithm refers to the robustness of its feature preferences, with respect to data sampling and to …
What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and …
Under-reporting occurs in survey data when there is a reason for participants to give a false negative response to a question, e.g. maternal smoking in epidemiological studies. Failing to correct this misreporting introduces biases and it may lead to …
Producing stable feature rankings is critical in many areas, such as in bioinformatics where the robustness of a list of ranked genes is crucial to interpretation by a domain expert. In this paper, we study Spearman’s rho as a measure of stability to …
Microarray data classification has been typically seen as a difficult challenge for machine learning researchers mainly due to its high dimension in features while sample size is small. Because of this particularity, feature selection is usually …
We study information theoretic methods for ranking biomarkers. In clinical trials, there are two, closely related, types of biomarkers: predictive and prognostic, and disentangling them is a key challenge. Our first step is to phrase biomarker …