Faculty of Engineering, University of Guilan, Rasht, Iran
چکیده
Paraphrase sentences are a different expression of same meanings. Recognizing paraphrase sentences or phrases is an important task in natural language processing systems, but no Persian paraphrase corpus has been developed yet.In this paper, we represent such corpus by using an automatic, unsupervised method for extracting paraphrases.Using data from news agencies and internet news web pages and an algorithm based on Jaccard edit distance,paraphrases are extracted. Paraphrases are extracted in three classes, namely, paraphrase, not paraphrase and irrelevant. Unlike many other approaches, paraphrase phrases are extracted as well as paraphrase sentences. Next, a new crowd sourcing approach based on Telegram messaging robot is used to judge actual labels for each pair of extracted paraphrase candidate. Judged pairs are evaluated and the final corpus is created. Degarbayan corpus consists of 1,523 pairs of paraphrases and the first version of the corpus is available online for academic purposes.
Maanijou,Reza و Mirroshandel,Seyed Abolghasem . (1396). Degarbayan: Developing a Persian Paraphrase Corpus by Crowd Sourcing. (e206904). علوم رایانش و فناوری اطلاعات, 15(1), e206904
MLA
Maanijou,Reza , و Mirroshandel,Seyed Abolghasem . "Degarbayan: Developing a Persian Paraphrase Corpus by Crowd Sourcing" .e206904 , علوم رایانش و فناوری اطلاعات, 15, 1, 1396, e206904.
HARVARD
Maanijou Reza, Mirroshandel Seyed Abolghasem. (1396). 'Degarbayan: Developing a Persian Paraphrase Corpus by Crowd Sourcing', علوم رایانش و فناوری اطلاعات, 15(1), e206904.
CHICAGO
Reza Maanijou و Seyed Abolghasem Mirroshandel, "Degarbayan: Developing a Persian Paraphrase Corpus by Crowd Sourcing," علوم رایانش و فناوری اطلاعات, 15 1 (1396): e206904,
VANCOUVER
Maanijou Reza, Mirroshandel Seyed Abolghasem. Degarbayan: Developing a Persian Paraphrase Corpus by Crowd Sourcing. علوم رایانش و فناوری اطلاعات, 1396; 15(1): e206904.