Source Themes

Insights into distributed feature ranking

In an era in which the volume and complexity of datasets is continuously growing, feature selection techniques have become indispensable to extract useful information from huge amounts of data. However, existing algorithms may not scale well when …

Multi-target feature selection through output space clustering

A key challenge in information theoretic feature selection is to estimate mutual information expressions that capture three desirable terms: the relevancy of a feature with the output, the redundancy and the complementarity between groups of …

Toward an Understanding of Adversarial Examples in Clinical Trials

Deep learning systems can be fooled by small, worst-case perturbations of their inputs, known as adversarial examples. This has been almost exclusively studied in supervised learning, on vision tasks. However, adversarial examples in counterfactual …

Distinguishing Prognostic and Predictive Biomarkers: An Information Theoretic Approach

The identification of biomarkers to support decision-making is central to personalized medicine, in both clinical and research scenarios. The challenge can be seen in two halves: identifying predictive markers, which guide the development/use of …

On the Stability of Feature Selection Algorithms

Feature Selection is central to modern data science, from exploratory data analysis to predictive model-building. The stability of a feature selection algorithm refers to the robustness of its feature preferences, with respect to data sampling and to …

Simple strategies for semi-supervised feature selection

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and …

Dealing with under-reported variables: An information theoretic solution

Under-reporting occurs in survey data when there is a reason for participants to give a false negative response to a question, e.g. maternal smoking in epidemiological studies. Failing to correct this misreporting introduces biases and it may lead to …

On the Use of Spearman's Rho to Measure the Stability of Feature Rankings

Producing stable feature rankings is critical in many areas, such as in bioinformatics where the robustness of a list of ranked genes is crucial to interpretation by a domain expert. In this paper, we study Spearman’s rho as a measure of stability to …

Exploring the consequences of distributed feature selection in DNA microarray data

Microarray data classification has been typically seen as a difficult challenge for machine learning researchers mainly due to its high dimension in features while sample size is small. Because of this particularity, feature selection is usually …

Disentangling Prognostic and Predictive Biomarkers Through Mutual Information

We study information theoretic methods for ranking biomarkers. In clinical trials, there are two, closely related, types of biomarkers: predictive and prognostic, and disentangling them is a key challenge. Our first step is to phrase biomarker …