Source Themes

Toward an Understanding of Adversarial Examples in Clinical Trials

Deep learning systems can be fooled by small, worst-case perturbations of their inputs, known as adversarial examples. This has been almost exclusively studied in supervised learning, on vision tasks. However, adversarial examples in counterfactual …

Distinguishing Prognostic and Predictive Biomarkers: An Information Theoretic Approach

The identification of biomarkers to support decision-making is central to personalized medicine, in both clinical and research scenarios. The challenge can be seen in two halves: identifying predictive markers, which guide the development/use of …

On the Stability of Feature Selection Algorithms

Feature Selection is central to modern data science, from exploratory data analysis to predictive model-building. The stability of a feature selection algorithm refers to the robustness of its feature preferences, with respect to data sampling and to …

Simple strategies for semi-supervised feature selection

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and …

Dealing with under-reported variables: An information theoretic solution

Under-reporting occurs in survey data when there is a reason for participants to give a false negative response to a question, e.g. maternal smoking in epidemiological studies. Failing to correct this misreporting introduces biases and it may lead to …

On the Use of Spearman's Rho to Measure the Stability of Feature Rankings

Producing stable feature rankings is critical in many areas, such as in bioinformatics where the robustness of a list of ranked genes is crucial to interpretation by a domain expert. In this paper, we study Spearman’s rho as a measure of stability to …

Exploring the consequences of distributed feature selection in DNA microarray data

Microarray data classification has been typically seen as a difficult challenge for machine learning researchers mainly due to its high dimension in features while sample size is small. Because of this particularity, feature selection is usually …

Disentangling Prognostic and Predictive Biomarkers Through Mutual Information

We study information theoretic methods for ranking biomarkers. In clinical trials, there are two, closely related, types of biomarkers: predictive and prognostic, and disentangling them is a key challenge. Our first step is to phrase biomarker …

Algorithmic challenges in Big Data analytics

This session studies specific challenges that Machine Learning (ML) algorithms have to tackle when faced with Big Data problems. These challenges can arise when any of the dimensions in a ML problem grows significantly: a) size of training set, b) …

Ranking Biomarkers Through Mutual Information

We study information theoretic methods for ranking biomarkers. In clinical trials there are two, closely related, types of biomarkers: predictive and prognostic, and disentangling them is a key challenge. Our first step is to phrase biomarker ranking …