Quick Search:

Probabilistic Named Entity Recognition for nonstandard format entities using cooccurrence word embeddings

Alshehabi Al-Ani, Jabir ORCID logoORCID: https://orcid.org/0000-0002-0553-2538 and Fasli, M. (2020) Probabilistic Named Entity Recognition for nonstandard format entities using cooccurrence word embeddings. In: Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. IEEE

Full text not available from this repository.

Abstract

The use of short text has become widespread in social media like Twitter and Facebook. Typically, users on social media platforms adopt nonstandard format terms when posting. This introduces challenges for Information Retrieval (IR) and Natural Language Processing (NLP) and standard or classical methods tend not to perform well in this domain. In this paper, we have addressed one of the challenges in IR which is Named Entity Recognition (NER). We introduce a novel probabilistic approach which targets entities occurring in an informal (nonstandard) format within short text. The Probabilistic Named Entity Recognition (PNER) model identifies these entities using cooccurrence patterns. These patterns have been detected using the word cooccurrence embeddings of 278.6 million tweets. The results show an enhancement of 7% on two standard methods when used in combination with PNER. The testing dataset has been created using the standard methods in addition to street names and places taken from the Open Street Map (OSM) database. © 2019 IEEE.

Item Type: Book Section
Status: Published
DOI: 10.1109/BigData47090.2019.9005587
School/Department: School of Science, Technology and Health
URI: https://ray.yorksj.ac.uk/id/eprint/7562

University Staff: Request a correction | RaY Editors: Update this record