A graph-based feature selection method for improving medical diagnosis
Abstract
Classification systems have been widely utilized in medical domain to explore patient’s data and extract a predictive model. This model helps physicians to improve their prognosis, diagnosis or treatment planning procedures. Models based on data mining and machine learning techniques have been developed to detect the disease early or assist in clinical breast cancer diagnoses. Medical datasets are often classified by a large number of disease measurements and a relatively small number of patient records. All these measurements (features) are not important or irrelevant/noisy. Feature selection is commonly applied to improve the performance of models. Feature selection is one of the most common and critical tasks in database classification. It reduces the computational cost by removing insignificant features. Feature selection methods can help select the most distinguishing feature sets for classifying different cancers. Consequently, this makes the diagnosis process accurate and comprehensible. This paper presents a graph based feature selection method for medical database classification. Sex benchmarked datasets, which are available in the UCI Machine Learning Repository, have been used in this work. The classification accuracy shows that the proposed method is capable of producing good results with fewer features than the original datasets.
Keywords
Feature selection; medical dataset; Graph clustering; Feature clustering