I am a machine learning researcher with experience in developing, enhancing and delivering novel statistical and machine learning methods tailored to healthcare analytics. I am member of Novartis’ Advanced Methodology and Data Science group focusing on developing new machine learning methods with the aim of improving drug development in multiple projects.
I did my PhD in statistical machine learning on the area of hypothesis testing and feature selection in semi-supervised scenarios in the University of Manchester’s Department of Computer Science. Afterwards, I spent many years as post-doctoral researcher on developing novel methodologies for analysing: self-reported epidemiological data with Manchester’s Health e-Research Center, clinical trials data for personalised medicine with AstraZeneca and digital healthcare data for digital biomarker development with Roche.
Disclaimer: this is my personal page, the content is my own responsibility and it is not connected to/supported by any entity with which I have been, am now, or will be affiliated.
PhD in Machine Learning, 2015
University of Manchester, UK
MSc in Information Systems, 2011
Aristotle University of Thessaloniki, Greece
MSc in Communications and Signal Processing, 2009
Imperial College London, UK
MEng in Electrical and Computer Engineering, 2006
Aristotle University of Thessaloniki, Greece
Oct 2020: I am now member of the editorial board of the Machine Learning Journal (MLJ). I also joined the Subgroup Analysis special interest group, which is sponsored by the European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) and the Statisticians in the Pharmaceutical Industry (PSI) organisation.
Sept 2020: On the September 18th I will present on the Statistical Learning workshop organised by the Data Mining and Machine Learning group of the University of Geneva. The aim of the workshop is to bring together the research communities of statistics and machine learning to foster a discussion between the two fields and develop research synergies. For more details and registration see here.
June 2020: I joined Novartis’ Advanced Methodology and Data Science (AMDS) group, where I focus on developing novel machine learning methods with the aim of improving drug development in multiple projects. Furthermore, in collaboration with data scientists and biostatisticians, we work to ensure that state-of-the-art statistical and machine learning methods are used at the trial and project level.
May 2020: Together with Lee Cooper (Northwestern University), Naghmeh Ghazaleh (Roche), Jonas Richiardi (Lausanne University Hospital), Damian Roqueiro (ETH Zurich) and Diego Saldana (Roche), we organise PharML: Machine Learning for Pharma and Healthcare Applications workshop. The workshop will be co-located with ECML-PKDD 2020 in Ghent, Belgium (September 14-18, 2020). You can find more information in PharML webpage and the call-for-papers is now open. Go ahead and submit your exciting work, the deadline is 9th of June!
March 2020: My talk in the Applied Machine Learning Days (AMLD) conference on providing a machine learning perspective on the emotional content of the parkensonian speech is available in youtube.
Clinical trials dataR code for the project of deriving predictive biomarkers using information theoretic methods can be found in GitHub If you make use of the code, please cite the paper: Distinguishing Prognostic and Predictive Biomarkers: An Information Theoretic Approach.
Semi-supervised dataMatlab code for the project of semi-supervised feature selection can be found in GitHub. If you make use of the code, please cite the paper: Simple strategies for semi-supervised feature selection.
Under-reported dataMatlab code for the project of feature selection with under-reported variables can be found in GitHub. If you make use of the code, please cite the paper: Dealing with under-reported variables: An information theoretic solution.
Positive-unlabelled dataMatlab code for the project of hypothesis testing/power analysis/sample size determination in positive-unlabelled data can be found in project’s homepage. If you make use of the code, please cite the paper: Statistical hypothesis testing in positive unlabelled data.
Multi-label dataJava code for the project of stratification for multi-label data can be found in Mulan, a Java Library for Multi-Label Learning. If you make use of the code, please cite the paper: On the Stratification of Multi-label Data.Our algoirthm for iterative stratification have been implemented in various other languages, e.g. R and Matlab. In Python there are various packages that include our algorithm, such as the Scikit-multilearn and the iterative-stratification.