About me

I am a machine learning researcher with experience in developing, enhancing and delivering novel statistical and machine learning methods tailored to healthcare analytics. I am member of Novartis’ Advanced Methodology and Data Science group focusing on developing new machine learning methods with the aim of improving drug development in multiple projects.

I did my PhD in statistical machine learning on the area of hypothesis testing and feature selection in semi-supervised scenarios in the University of Manchester’s Department of Computer Science. Afterwards, I spend 4.5 years as post-doctoral researcher on developing novel methodologies for analysing: self-reported epidemiological data with Manchester’s Health e-Research Center, clinical trials data for personalised medicine with AstraZeneca and digital healthcare data for digital biomarker development with Roche.

Disclaimer: this is my personal page, the content is my own responsibility and it is not connected to/supported by any entity with which I have been, am now, or will be affiliated.

Interests

  • Feature selection
  • Information theory
  • Biomarker discovery for personalised healthcare
  • Digital biomarker discovery
  • Multi-target learning

Education

  • PhD in Machine Learning, 2015

    University of Manchester, UK

  • MSc in Information Systems, 2011

    Aristotle University of Thessaloniki, Greece

  • MSc in Communications and Signal Processing, 2009

    Imperial College London, UK

  • MEng in Electrical and Computer Engineering, 2006

    Aristotle University of Thessaloniki, Greece

News

Oct 2020: I am now member of the editorial board of the Machine Learning Journal (MLJ). I also joined the Subgroup Analysis special interest group, which is sponsored by the European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) and the Statisticians in the Pharmaceutical Industry (PSI) organisation.

Sept 2020: On the September 18th I will present on the Statistical Learning workshop organised by the Data Mining and Machine Learning group of the University of Geneva. The aim of the workshop is to bring together the research communities of statistics and machine learning to foster a discussion between the two fields and develop research synergies. For more details and registration see here.

June 2020: I joined Novartis’ Advanced Methodology and Data Science (AMDS) group, where I focus on developing novel machine learning methods with the aim of improving drug development in multiple projects. Furthermore, in collaboration with data scientists and biostatisticians, we work to ensure that state-of-the-art statistical and machine learning methods are used at the trial and project level.

May 2020: Together with Lee Cooper (Northwestern University), Naghmeh Ghazaleh (Roche), Jonas Richiardi (Lausanne University Hospital), Damian Roqueiro (ETH Zurich) and Diego Saldana (Roche), we organise PharML: Machine Learning for Pharma and Healthcare Applications workshop. The workshop will be co-located with ECML-PKDD 2020 in Ghent, Belgium (September 14-18, 2020). You can find more information in PharML webpage and the call-for-papers is now open. Go ahead and submit your exciting work, the deadline is 9th of June!

March 2020: My talk in the Applied Machine Learning Days (AMLD) conference on providing a machine learning perspective on the emotional content of the parkensonian speech is available in youtube.

Recent Publications

See all publications »

Quickly discover relevant content by filtering publications.
(2020). Feature selection with limited bit depth mutual information for portable embedded systems. Knowledge-Based Systems, volume 19, 105885.

Link to journal DOI

(2020). Multi-target regression via output space quantization. International Joint Conference on Neural Networks (IJCNN).

arXiv

(2019). Efficient feature selection using shrinkage estimators. Machine Learning Journal (MLJ), volume 108(8-9), pages 1261–1286.

Link to journal Correction Supplementary material Code DOI

(2019). On the Stability of Feature Selection in the Presence of Feature Correlations. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML/PKDD). Acceptance rate 130/734 (18%).

Link to proceedings Paper Supplementary material Code DOI

(2019). Information Theoretic Multi-Target Feature Selection via Output Space Quantization. Entropy, volume 21(9).

Link to journal Code DOI

Research experience

 
 
 
 
 

Data Science Fellow in Digital Biomarkers

Roche

Dec 2018 – Apr 2020 Basel, Switzerland
I worked as a post-doctoral research scientist at Roche pRED (pharma Research and Early Development) in the exciting area of developing digital biomarkers for neurological disorders. For developing digital biomarkers, sensor data (such as motion detection, audio and video) are analysed and, by using advanced machine learning methods, are transformed to meaningful markers. My fellowship focused on developing a speech emotion recognition system as a digital biomarker for patients with Parkinson’s disease. I worked both with real-world and clinical trial data and I used machine learning modelling to discover novel insights on the emotional content of the parkinsonian speech.
 
 
 
 
 

Data Science Fellow in Personalised Healthcare

AstraZeneca & University of Manchester

Jan 2017 – Nov 2018 Manchester, UK
My fellowship focused on developing machine learning approaches for predictive biomarker discovery, i.e. biomarkers that convey information over the treatment efficacy. Furthermore, a methodology to quantify the robustness of biomarker discovery algorithms was developed. I worked with two clinical trials, one for treating advanced non-small cell lung cancer (NSCLC) and one for the prevention of cardiovascular disease among patients undergoing chronic hemodialysis.
 
 
 
 
 

Post-doctoral Researcher

Health e-Research Center & University of Manchester, UK

Oct 2015 – Dec 2016 Manchester, UK
My post-doc focused on developing machine learning techniques for correcting under-reported biases in self-reported epidemiological data. I worked with Born in Bradford database.
 
 
 
 
 

PhD Researcher

University of Manchester, UK

Sep 2011 – Oct 2016 Manchester, UK
My PhD was in statistical machine learning, and particularly, in developing hypothesis testing and feature selection methods for semi-supervised data. It was funded by the Engineering and Physical Sciences Research Council (EPSRC) and the Propondis Foundation and my supervisor was Professor Gavin Brown. My PhD awarded the best thesis in the Department of Computer Science, while my work on developing a methodology for sample size determination in partially labelled data got the best student paper award in ECML/PKDD 2014.

Software

Clinical trials data
R code for the project of deriving predictive biomarkers using information theoretic methods can be found in GitHub If you make use of the code, please cite the paper: Distinguishing Prognostic and Predictive Biomarkers: An Information Theoretic Approach.

Semi-supervised data
Matlab code for the project of semi-supervised feature selection can be found in GitHub. If you make use of the code, please cite the paper: Simple strategies for semi-supervised feature selection.

Under-reported data
Matlab code for the project of feature selection with under-reported variables can be found in GitHub. If you make use of the code, please cite the paper: Dealing with under-reported variables: An information theoretic solution.

Positive-unlabelled data
Matlab code for the project of hypothesis testing/power analysis/sample size determination in positive-unlabelled data can be found in project’s homepage. If you make use of the code, please cite the paper: Statistical hypothesis testing in positive unlabelled data.

Multi-label data
Java code for the project of stratification for multi-label data can be found in Mulan, a Java Library for Multi-Label Learning. If you make use of the code, please cite the paper: On the Stratification of Multi-label Data.
Our algoirthm for iterative stratification have been implemented in various other languages, e.g. R and Matlab. In Python there are various packages that include our algorithm, such as the Scikit-multilearn and the iterative-stratification.