Discrepancies Detection in Arabic and English Documents
Abstract
In the paper, there are analyzed and compared results of usable methods for discrepancies detection based on character n-gram profiles (the set of character n-gram normalized frequencies of a text) for English and Arabic documents. English and Arabic texts were analyzed from many statistical characteristics point of view. We covered some statistical differences between both languages and we applied some heuristics for measurements of text parts dissimilarities. The results for each text can call for an attention to the text (or not) if the text parts were written by the same author. We evaluate some Arabic and English documents and show its parts they contain discrepancies and they need some following analysis for plagiarism detection. The analysis depends on selected parameters prepared in experiments.
Keywords
n-grams; stylistic measure; plagiarism; authorship