.:: Natural Sciences Publishing ::.

Login

New user?

Mathematical Sciences Letters

An International Journal

MSL Home

For Authors

Editorial Board

Publication Ethics

Processing Charges

Submit an Article

Content

Forthcoming Papers

Subscription

Content


	Volumes > Vol. 4 > No. 2


	Persian and Arabic Text Recognition with NN, Decision Tree and K-Nearest Neighbor

	PP: 209-217

	Author(s)

	Hamid Parvin, Reza Parvin,

	Abstract

	A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations. This paper proposes an innovative approach to improve the classification performance of Persian texts considering a very large thesaurus. The paper proposes a flexible method to recognize and categorize the Persian texts employing a thesaurus as a helpful knowledge. In the corpus, when utilizing the thesaurus the method obtains a more representative set of word-frequencies comparing to those obtained when the method disables the thesaurus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. The k-nearest neighbor classifier, decision tree classifier and k-means clustering algorithm are employed as classifier over the frequency based features. Experimental results indicate enabling thesaurus causes the method significantly outperforms in text classification and clustering.

Home

Copyright naturalspublishing.com. All Rights Reserved