Open Access Open Access  Restricted Access Subscription or Fee Access

A Plagiarized Source Retrieval System Developed using Efficient Download Filtering and POS Tagged Query Formulation with Effective Paragraph based chunking

Riya Ravi N, Deepa Gupta


Source Retrieval is an important task of External Plagiarism Detection system which involves in identifying a set of candidate source documents for a given suspicious document. Not to lose any actual source document while reducing the size of the candidate source document set is crucial. This paper describes the approach of Source Retrieval task of External Plagiarism Detection System. The approach includes chunking of documents based on paragraphs along with Part-of- Speech tagging and an efficient download filtering method. The proposed system is evaluated against PAN 2011-12, PAN 2012-13 PAN 2014-15 Test Data Set and results are analysed and compared using standard PAN measures: Recall, Precision, F Measure, average number of queries and downloads. The proposed approach exhibited improved efficiency in PAN 2015 conducted by PAN CLEF Evaluation lab1, by acquiring highest values for F Measure and Precision along with least Downloads. The results are further improved by incorporating efficient query and download filtering mechanisms over the proposed system. The effect of the enhanced proposed system is also discussed and analysed in this paper.


Source Retrieval, External Plagiarism Detection System, POS Tagging, PAN, ClueWeb09 corpus, API Search Engine, TF-IDF.

Full Text:


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information. Also: DOI is paid service which provided by a third party. We never mentioned that we go for this for our any journal. However, journal have no objection if author go directly for this paid DOI service.