Impact of Characteristics in Viral Integration Hotspots on Classification Performance
Abstract
Main reason of genetic defects is the disorders in gene regions which are responsible for coding the proteins necessary for normal body functions. By gene therapy, the regions with disorders can be detected and their genetic content can be changed for good. These regions may have special characteristics in terms of nucleotide dispersion which are beyond the known statistical norms of genome. In this study, such a characteristic is defined and its effect on predicting the strand direction of genomic reads (classification) is analyzed. By the analyses, it is observed that Canonical Correlation Analysis (CCA) method outperforms well known Support Vector Machines (SVM) approach considering the discrimination of reads according to their strand directions.
Collections
- Bildiri [64839]