About me

I am a machine learning researcher with experience in developing, enhancing and delivering novel statistical and machine learning methods tailored to healthcare analytics. In 2020 I joined Novartis’ Advanced Methodology and Data Science group, which focuses on developing new machine learning methods with the aim of improving drug development in multiple projects. I am member of the editorial board of Machine Learning Journal (MLJ) and vice-chair of the technical committee on Statistical Pattern Recognition Techniques of the International Association for Pattern Recognition (IAPR).

I did my PhD in statistical machine learning on the area of hypothesis testing and feature selection in semi-supervised scenarios in the University of Manchester’s Department of Computer Science. Afterwards, I spent many years as post-doctoral researcher on developing novel methodologies for analysing: self-reported epidemiological data with Manchester’s Health e-Research Center, clinical trials data for personalised medicine with AstraZeneca and digital healthcare data for digital biomarker development with Roche.

Disclaimer: this is my personal page, the content is my own responsibility and it is not connected to/supported by any entity with which I have been, am now, or will be affiliated.

Interests

  • Feature selection
  • Information theory
  • Biomarker discovery for personalised healthcare
  • Digital biomarker discovery
  • Multi-target learning

Education

  • PhD in Machine Learning, 2015

    University of Manchester, UK

  • MSc in Information Systems, 2011

    Aristotle University of Thessaloniki, Greece

  • MSc in Communications and Signal Processing, 2009

    Imperial College London, UK

  • MEng in Electrical and Computer Engineering, 2006

    Aristotle University of Thessaloniki, Greece

News

Jan 2023: I have been appointed vice-chair of the technical committee on Statistical Pattern Recognition Techniques of the International Association for Pattern Recognition (IAPR).

Dec 2022: Our paper on benchmarking methods for characterising treatment effect heterogeneity in clinical trials published in Biometrical Journal. If you are interested on simulating datasets of heterogeneous treatment effects you can check our benchtm package.

Sept 2022: Organised the session on knockoffs and multiple testing with biomedical applications in the Multiple Comparison Procedures (MCP) conference, where Lucas Janson, Zhimei Ren, Jinzhou Li and Asher Spector presented their exciting works.

May 2022: Extemely happy to present our work in Novartis on quantifying uncertainty on machine learning-based predictive biomarker discovery to the MSc in Data and Web Science, of the Artistotle University of Thessaloniki. More details here.

April 2022: This year we organise the third edition of PharML workshop, colocated with ECML-PKDD 2022. The call for papers is officially open: https://easychair.org/cfp/pharml2022.

Dec 2021: Lasse Hansen‘s work on assessing depression using speech emotion recognition systems published in Acta Psychiatrica Scandinavica. For those interested, Lasse has a nice twitter thread that summarizes some of the main points of the paper.

Recent Publications

See all publications »

Quickly discover relevant content by filtering publications.
(2022). Comparing algorithms for characterizing treatment effect heterogeneity in randomized trials. Biometrical Journal.

Link to journal code DOI

(2022). A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission. Acta Psychiatrica Scandinavica.

Link to journal bioRxiv DOI

(2021). Using knockoffs for controlled predictive biomarker identification. Statistics in Medicine, volume 40(25), pages 5453–5473.

Link to journal Code DOI

(2021). A machine learning perspective on the emotional content of Parkinsonian speech. Artificial Intelligence in Medicine Volume 115, May 2021, 102061.

Video Link to journal DOI

(2020). When Size Matters: Markov Blanket with Limited Bit Depth Conditional Mutual Information. International Workshop on IoT, Edge, and Mobile for Embedded Machine Learning (ITEM).

Link to proceedings DOI

Research experience

 
 
 
 
 

Current role: Associate Director of Data Science

Novartis

Jun 2020 – Present Basel, Switzerland
I am member of Novartis’ Advanced Methodology and Data Science group focusing on developing new machine learning methods with the aim of improving drug development in multiple projects.
 
 
 
 
 

Data Science Fellow in Digital Biomarkers

Roche

Dec 2018 – Apr 2020 Basel, Switzerland
I worked as a post-doctoral research scientist at Roche pRED (pharma Research and Early Development) in the exciting area of developing digital biomarkers for neurological disorders. For developing digital biomarkers, sensor data (such as motion detection, audio and video) are analysed and, by using advanced machine learning methods, are transformed to meaningful markers. My fellowship focused on developing a speech emotion recognition system as a digital biomarker for patients with Parkinson’s disease. I worked both with real-world and clinical trial data and I used machine learning modelling to discover novel insights on the emotional content of the parkinsonian speech. You can find more details in this presentation.
 
 
 
 
 

Data Science Fellow in Personalised Healthcare

AstraZeneca & University of Manchester

Jan 2017 – Nov 2018 Manchester, UK
My fellowship focused on developing machine learning approaches for predictive biomarker discovery, i.e. biomarkers that convey information over the treatment efficacy. Furthermore, a methodology to quantify the robustness of biomarker discovery algorithms was developed. I worked with two clinical trials, one for treating advanced non-small cell lung cancer (NSCLC) and one for the prevention of cardiovascular disease among patients undergoing chronic hemodialysis.
 
 
 
 
 

Post-doctoral Researcher

Health e-Research Center & University of Manchester, UK

Oct 2015 – Dec 2016 Manchester, UK
My post-doc focused on developing machine learning techniques for correcting under-reported biases in self-reported epidemiological data. I worked with Born in Bradford database.
 
 
 
 
 

PhD Researcher

University of Manchester, UK

Sep 2011 – Oct 2015 Manchester, UK
My PhD was in statistical machine learning, and particularly, in developing hypothesis testing and feature selection methods for semi-supervised data. It was funded by the Engineering and Physical Sciences Research Council (EPSRC) and the Propondis Foundation and my supervisor was Professor Gavin Brown. My PhD awarded the best thesis in the Department of Computer Science, while my work on developing a methodology for sample size determination in partially labelled data got the best student paper award in ECML/PKDD 2014.

Software

Clinical trials data
R code for the project of deriving predictive biomarkers using information theoretic methods can be found in GitHub If you make use of the code, please cite the paper: Distinguishing Prognostic and Predictive Biomarkers: An Information Theoretic Approach.

Semi-supervised data
Matlab code for the project of semi-supervised feature selection can be found in GitHub. If you make use of the code, please cite the paper: Simple strategies for semi-supervised feature selection.

Under-reported data
Matlab code for the project of feature selection with under-reported variables can be found in GitHub. If you make use of the code, please cite the paper: Dealing with under-reported variables: An information theoretic solution.

Positive-unlabelled data
Matlab code for the project of hypothesis testing/power analysis/sample size determination in positive-unlabelled data can be found in project’s homepage. If you make use of the code, please cite the paper: Statistical hypothesis testing in positive unlabelled data.

Multi-label data
Java code for the project of stratification for multi-label data can be found in Mulan, a Java Library for Multi-Label Learning. If you make use of the code, please cite the paper: On the Stratification of Multi-label Data.
Our algoirthm for iterative stratification have been implemented in various other languages, e.g. R and Matlab. In Python there are various packages that include our algorithm, such as the Scikit-multilearn and the iterative-stratification.