Exploring the consequences of distributed feature selection in DNA microarray data


Microarray data classification has been typically seen as a difficult challenge for machine learning researchers mainly due to its high dimension in features while sample size is small. Because of this particularity, feature selection is usually applied trying to reduce its high dimensionality. However, existing algorithms may not scale well when dealing with this amount of features, and a possible solution is to distribute the features into several nodes. In this work we explore the process of distribution on microarray data - which has recently gained attention - and we evaluate to what extent it is possible to obtain similar results as those obtained with the whole dataset. We performed experiments with different aggregation methods, feature rankers and also evaluated the effect of distributing the feature ranking process in the subsequent classification performance.

International Joint Conference on Neural Networks (IJCNN)