The Effect of Recursive Feature Elimination with Cross-Validation Method on Classification Performance with Different Sizes of Datasets

Akkaya, Berke

dc.contributor.author	Akkaya, Berke
dc.date.accessioned	2021-12-10T10:11:45Z
dc.date.available	2021-12-10T10:11:45Z
dc.identifier.citation	Akkaya B., "The Effect of Recursive Feature Elimination with Cross-Validation Method on Classification Performance with Different Sizes of Datasets", 4th International Conference on Data Science & Applications, İstanbul, Türkiye, 4 - 06 Haziran 2021, ss.142-152
dc.identifier.other	av_2febfa84-2e47-4e77-a1ee-177d9f9ef7b3
dc.identifier.other	vv_1032021
dc.identifier.uri	http://hdl.handle.net/20.500.12627/169394
dc.identifier.uri	https://avesis.istanbul.edu.tr/api/publication/2febfa84-2e47-4e77-a1ee-177d9f9ef7b3/file
dc.description.abstract	The high dimensionality problem, which is one of the problems encountered in classification problems, arises when there are too many features in the dataset. This problem affects the success of classification models and causes loss of time. Feature selection is one of the methods used to eliminate the high dimensionality problem. Feature selection is defined as the selection of the best features that can represent the original dataset. This process aims to reduce the size of the data by reducing the number of features in the dataset by selecting the most useful and important features for the relevant problem. In this study, the performances of various classification algorithms in different data sizes were compared by using the recursive feature elimination method with cross-validation, which is one of the feature selection methods. Recursive feature elimination with cross-validation is a method that tries to get the most accurate result by eliminating the least important variables with cross- validation. In the study, datasets containing binary classification problems with a balanced distribution were used. Accuracy, ROC-AUC score, and fit time were used as evaluation metrics, while Logistic Regression, Support Vector Machines, Naive Bayes, k-Nearest Neighbors, Stochastic Gradient Descent, Decision Tree, AdaBoost, Multilayer Perceptron, and XGBoost classifiers were used as classification algorithms in the study. When the findings obtained as a result of recursive feature elimination with cross-validation were examined, it was observed that the accuracy increased by 5% on average and the ROC-AUC score increased by 5,3% on average, and the fit time decreased by about 5,1 seconds on average. It has been concluded that Naive Bayes and Multilayer Perceptron classifiers are the most sensitive to feature selection since they are the classifiers whose classification performance increases the most after feature selection.
dc.language.iso	eng
dc.subject	ÇOK DİSİPLİNLİ BİLİMLER
dc.subject	PSİKOLOJİ, MATEMATİKSEL
dc.subject	İstatistik
dc.subject	İstatistik Analiz ve Uygulamaları
dc.subject	Temel Bilimler
dc.subject	Doğa Bilimleri Genel
dc.subject	Multidisciplinary
dc.subject	Psikoloji
dc.subject	Temel Bilimler (SCI)
dc.title	The Effect of Recursive Feature Elimination with Cross-Validation Method on Classification Performance with Different Sizes of Datasets
dc.type	Bildiri
dc.contributor.department	İstanbul Üniversitesi , İşletme Fakültesi , İşletme Bölümü
dc.contributor.firstauthorID	2725033

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Bildiri [1228]

Show simple item record