About me

I am a machine learning researcher with experience in developing, enhancing and delivering novel statistical and machine learning methods tailored to healthcare analytics. In 2020 I joined Novartis’ Advanced Methodology and Data Science group, which focuses on developing new machine learning methods with the aim of improving drug development in multiple projects. I am member of the editorial board of Machine Learning Journal (MLJ).

I did my PhD in statistical machine learning on the area of hypothesis testing and feature selection in semi-supervised scenarios in the University of Manchester’s Department of Computer Science. Afterwards, I spent many years as post-doctoral researcher on developing novel methodologies for analysing: self-reported epidemiological data with Manchester’s Health e-Research Center, clinical trials data for personalised medicine with AstraZeneca and digital healthcare data for digital biomarker development with Roche.

Disclaimer: this is my personal page, the content is my own responsibility and it is not connected to/supported by any entity with which I have been, am now, or will be affiliated.


  • Feature selection
  • Information theory
  • Biomarker discovery for personalised healthcare
  • Digital biomarker discovery
  • Multi-target learning


  • PhD in Machine Learning, 2015

    University of Manchester, UK

  • MSc in Information Systems, 2011

    Aristotle University of Thessaloniki, Greece

  • MSc in Communications and Signal Processing, 2009

    Imperial College London, UK

  • MEng in Electrical and Computer Engineering, 2006

    Aristotle University of Thessaloniki, Greece


Dec 2021: Lasse Hansen‘s work on assessing depression using speech emotion recognition systems published in Acta Psychiatrica Scandinavica. For those interested, Lasse has a nice twitter thread that summarizes some of the main points of the paper.

Nov 2021: I presented my work on predictive knockoffs in the PSI Sub Group SIG Webinar on Modern approaches to subgroup identification.

Aug 2021: On September 13th, together with colleagues from industry and academia, we organise the second edition of PharML workshop, colocated with ECML-PKDD 2021. Check the exciting program here.

Jul 2021: The paper on using knockoffs for controlled predictive biomarker identification has just been published in Statistics in Medicine. For those interested, check out an earlier work from our group (Advanced Methodology and Data Science in Novartis) where the sequential knockoffs algorithm was introduced.

Apr 2021: The paper of my post-doc in Roche is published and open access in Artificial Intelligence in Medicine.

Recent Publications

See all publications »

Quickly discover relevant content by filtering publications.
(2021). A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission. Acta Psychiatrica Scandinavica, 2021..

Link to journal bioRxiv DOI

(2021). Using knockoffs for controlled predictive biomarker identification. Statistics in Medicine, volume 40(25), pages 5453–5473..

Link to journal Code DOI

(2021). A machine learning perspective on the emotional content of Parkinsonian speech. Artificial Intelligence in Medicine Volume 115, May 2021, 102061.

Video Link to journal DOI

(2020). When Size Matters: Markov Blanket with Limited Bit Depth Conditional Mutual Information. International Workshop on IoT, Edge, and Mobile for Embedded Machine Learning (ITEM).

Link to proceedings DOI

(2020). Feature selection with limited bit depth mutual information for portable embedded systems. Knowledge-Based Systems, volume 19, 105885.

Link to journal DOI

Research experience


Data Science Fellow in Digital Biomarkers


Dec 2018 – Apr 2020 Basel, Switzerland
I worked as a post-doctoral research scientist at Roche pRED (pharma Research and Early Development) in the exciting area of developing digital biomarkers for neurological disorders. For developing digital biomarkers, sensor data (such as motion detection, audio and video) are analysed and, by using advanced machine learning methods, are transformed to meaningful markers. My fellowship focused on developing a speech emotion recognition system as a digital biomarker for patients with Parkinson’s disease. I worked both with real-world and clinical trial data and I used machine learning modelling to discover novel insights on the emotional content of the parkinsonian speech. You can find more details in this presentation.

Data Science Fellow in Personalised Healthcare

AstraZeneca & University of Manchester

Jan 2017 – Nov 2018 Manchester, UK
My fellowship focused on developing machine learning approaches for predictive biomarker discovery, i.e. biomarkers that convey information over the treatment efficacy. Furthermore, a methodology to quantify the robustness of biomarker discovery algorithms was developed. I worked with two clinical trials, one for treating advanced non-small cell lung cancer (NSCLC) and one for the prevention of cardiovascular disease among patients undergoing chronic hemodialysis.

Post-doctoral Researcher

Health e-Research Center & University of Manchester, UK

Oct 2015 – Dec 2016 Manchester, UK
My post-doc focused on developing machine learning techniques for correcting under-reported biases in self-reported epidemiological data. I worked with Born in Bradford database.

PhD Researcher

University of Manchester, UK

Sep 2011 – Oct 2016 Manchester, UK
My PhD was in statistical machine learning, and particularly, in developing hypothesis testing and feature selection methods for semi-supervised data. It was funded by the Engineering and Physical Sciences Research Council (EPSRC) and the Propondis Foundation and my supervisor was Professor Gavin Brown. My PhD awarded the best thesis in the Department of Computer Science, while my work on developing a methodology for sample size determination in partially labelled data got the best student paper award in ECML/PKDD 2014.


Clinical trials data
R code for the project of deriving predictive biomarkers using information theoretic methods can be found in GitHub If you make use of the code, please cite the paper: Distinguishing Prognostic and Predictive Biomarkers: An Information Theoretic Approach.

Semi-supervised data
Matlab code for the project of semi-supervised feature selection can be found in GitHub. If you make use of the code, please cite the paper: Simple strategies for semi-supervised feature selection.

Under-reported data
Matlab code for the project of feature selection with under-reported variables can be found in GitHub. If you make use of the code, please cite the paper: Dealing with under-reported variables: An information theoretic solution.

Positive-unlabelled data
Matlab code for the project of hypothesis testing/power analysis/sample size determination in positive-unlabelled data can be found in project’s homepage. If you make use of the code, please cite the paper: Statistical hypothesis testing in positive unlabelled data.

Multi-label data
Java code for the project of stratification for multi-label data can be found in Mulan, a Java Library for Multi-Label Learning. If you make use of the code, please cite the paper: On the Stratification of Multi-label Data.
Our algoirthm for iterative stratification have been implemented in various other languages, e.g. R and Matlab. In Python there are various packages that include our algorithm, such as the Scikit-multilearn and the iterative-stratification.