Abstract
|
Genetic algorithms (GAs) have recently been used as a search method for training set selection in supervised machine learning. The assumption is made that not all the data are equally useful in training supervised algorithms. In this paper, we empirically study the performance of classical GA for selecting a ¡®good¡¯ training set for decision tree classifiers. We also discuss different fitness functions and their influence on the results. A set of widely used classification datasets from Kaggle and UCI machine learning repository are used. Empirical results show that improved generalization can indeed be obtained using this approach.
|