OntoHindi NER - An Ontology Based Novel Approach for Hindi Named Entity Recognition

Arti   Jain; Devendra K.   Tayal; Anuja   Arora

OntoHindi NER - An Ontology Based Novel Approach for Hindi Named Entity Recognition

Arti Jain, Devendra K. Tayal, Anuja Arora

Abstract

Named Entity Recognition (NER) is defined as an identification and classification of Named Entities (NEs) in a given text. Now-a-days, language and domain-specific NER systems are in progress, therefore Ontology-based Hindi language NER methodology for health domain (OntoHindi NER) is proposed in this paper. Hindi Health Data (HHD) is crawled from Indian websites- Traditional Knowledge Digital Library, Ministry of Ayush, University of Patanjali, and Linguistic Data Consortium for Indian Languages. HHD corpus comprises 310,530 words having considered NEs as- Person (PER), Disease (DIS), Consumable (CNS) and Symptom (SMP). OntoHindi NER maps ontology for health data to recognize NEs by maintaining hierarchical information of the ontological category of HHD corpus words. OntoHindi NER comprises of six vital stages- preparing gazetteer lists (four initial seed lists that are extended using Hindi WordNet synset), HHD pre-processing, HHD feature engineering, string-matching based feature engineering (Levenshtein distance and Linguistic based Improved Lin’s Matcher (LILM)), Concept Hierarchy based Mapping (CHM), COncept Selection and Aggregation (COSA). CHM structuralizes ontological mapping (1:1, 1:m, and m:1) on the basis of ontological semantic relationship, and COSA formulates ontology-based NE clusters on HHD corpus through silhouette measure. Further, cluster aggregations are exploited using standard k-means clustering which clubs HHD corpus into varied NE clusters. OntoHindi NER performance evaluation is done using four standard measures- precision, recall, F-score and model fitting time. Results of NER for Hindi language are validated through 5-fold cross-validation.

Keywords

Named entity recognition, Hindi, health domain, ontology, string matching, levenshtein distance, LILM, Concept Hierarchy based Mapping, COncept Selection and Aggregation.

Full Text:

PDF

Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information. Also: DOI is paid service which provided by a third party. We never mentioned that we go for this for our any journal. However, journal have no objection if author go directly for this paid DOI service.

Username
Password
Remember me