Feature selection algorithm based on PDF/PMF area difference

Viacheslav V. Danilov, Igor P. Skirnevskiy, Roman A. Manakov, Olga M. Gerget, Farid Melgani

Research output: Contribution to journalArticle

Abstract

Purpose: The research devoted to feature selection has been studied extensively in the last years and as a result, important results have been obtained. However, a feature selection technique remains a complex problem because of the necessity for its outcomes to be reliable and accurate within different limitations. Nowadays, there is no feature selection algorithm that outperforms any others. Our research is focused on the development of a feature selection algorithm and its comparison with other well-known feature selection techniques. The proposed algorithm is used for searching for the most reliable features. Despite the fact that we used the proposed algorithm in a specific case in our study (catheter detection task), the algorithm can also be used for general purposes as well. Therefore, it can be easily scaled to find the relevant features of the region of interest in other tasks. Methods: The proposed algorithm is based on the difference in probability density functions (PDFs) or probability mass functions (PMFs). We introduced a score value to assess how stable a certain feature is. The score value determines the degree of intersection of areas under the density/mass distribution functions. In our case where a catheter is detected, we described the region of interest by dividing a 20-feature set into several groups: morphometric, statistical, intensity-based, textural and geometric. To evaluate the results, we compared the results of the proposed algorithm with state-of-the-art feature selection techniques. The sets of features selected by different algorithms were used to train and test a linear SVM classifier. As an additional estimator, we used a neural network with the cascade-forward backpropagation (CFB). Results: The proposed Area Difference Feature Selection (ADFS) algorithm obtained the following accuracy on the intracardiac catheter dataset: 78.7 ± 3.2 % for a 3-feature subset, 82.3 ± 2.8 % for a 6-feature subset and 86.7 ± 1.5 % for a 12-feature subset. According to the obtained accuracy, ADFS was on the list of the three best algorithms. Additionally, we tested the ADFS algorithm on the breast cancer dataset from the UCI machine learning repository and observed that it outperformed other compared algorithms on the 3-feature and 12-feature subsets. Regarding the 6-feature subset, ADFS is inferior in accuracy to only MRMR and MCFS by approximately 1.8 % and 0.3 %, respectively. According to the processing time assessment, the most time-consuming algorithms were FSCM, MRMR, UDFS, and RFFS. In turn, the fastest algorithms were INFS, CFS, and ADFS with the following processing times: 2 ± 4, 2 ± 5 and 6 ± 16 ms. Conclusion: By testing and comparing the different feature selection algorithms, the proposed feature selection algorithm is shown to be accurate and effective for various tasks, including medical imaging and visualization.

Original languageEnglish
Article number101681
JournalBiomedical Signal Processing and Control
Volume57
DOIs
Publication statusPublished - Mar 2020

    Fingerprint

Keywords

  • Catheter detection
  • Feature engineering
  • Feature selection
  • Machine learning
  • Segmentation
  • SVM

ASJC Scopus subject areas

  • Signal Processing
  • Health Informatics

Cite this