|
|
|
|
|
New Hierarchical Clustering Algorithm for Protein Sequences Based on Hellinger Distance |
|
PP: 1541-1549 |
|
doi:10.18576/amis/100432
|
|
Author(s) |
|
Gamil Abdel-Azim,
|
|
Abstract |
|
Protein sequences clustering based on their sequence patterns has attracted lots of research efforts in the last decade.
The principal idea of most clustering systems is how to represent and interpret protein sequences, which principally determines the
performance of classifiers. In this paper, we proposed a new methodology, that definite a new descriptor to represent and interpret
each sequence using its Probability Densities Functions (PDF). The Hellinger distance is used to measure the similarity between
the sequences. Afterward, a hierarchical algorithm is applied to clustering proteins sequences using the Hellinger distance. Two of
protein data sets are using for the experiments; the first is a mixed between Influenza and Ebola virus and the second is a set of
Influenza. We compare between a two Hierarchical Clustering Algorithms, The first based on similarity measure is to use methods with
sequences alignments (HCAWSA). The second is the proposed approach to the similarity measure is to use methods without sequences
alignments.( HCAWOSA). The experiments result show that the proposed methodology is feasible and achieves good accuracy. |
|
|
|
|
|