Summary:
Measuring the similarity between documents is an important operation in the text processing field. In this paper, a newsimilarity measure is proposed. To compute the similarity between two documents with respect to a feature, the proposed measuretakes the following three cases into account: a) The feature appears in both documents, b) the feature appears in only one document,and c) the feature appears in none of the documents. For the first case, the similarity increases as the difference between the twoinvolved feature values decreases. Furthermore, the contribution of the difference is normally scaled. For the second case, a fixedvalue is contributed to the similarity. For the last case, the feature has no contribution to the similarity. The proposed measure isextended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-worlddata sets for text classification and clustering problems. The results show that the performance obtained by the proposed measure isbetter than that achieved by other measures
Technology Use: ASP. NET MVC, MS-SQL, JAVASCRIPT, HTML, CSS, BOOTSTRAP, ENTITY FRAMEWORK
Modules: NA
Algoritham Use: NA
.png)