Named Entity Recognition (NER)
Named Entity Recognition (NER) is a key task in Natural Language Processing (NLP) that identifies and classifies specific entities in text into predefined categories.
1. Definition
NER = Extracting “names” or important entities from text and categorizing them.
- Input: Unstructured text
- Output: Labeled entities
Common Entity Types:
- Person → Names of people
- Organization → Companies, institutions
- Location → Cities, countries, landmarks
- Date/Time → Dates, years, times
- Money → Amounts, currency
- Miscellaneous → Products, events, titles
2. Example
Text:
Apple Inc. was founded by Steve Jobs in Cupertino in 1976.
NER Output:
| Entity | Type |
|---|---|
| Apple Inc. | Organization |
| Steve Jobs | Person |
| Cupertino | Location |
| 1976 | Date |
3. How NER Works
- Text Preprocessing: Tokenization, cleaning
- Entity Detection: Identify which words are entities
- Entity Classification: Assign the right category
Approaches:
- Rule-based: Using dictionaries, patterns, regex
- Machine Learning: Train classifiers on labeled data
- Deep Learning: Use RNNs or Transformers (like BERT) for context-aware recognition
4. Applications
- Search engines → Understand queries better
- Chatbots & Virtual Assistants → Recognize names, locations, dates
- Information Extraction → Pull structured data from documents
- Healthcare → Extract medical entities like diseases, drugs
- Finance → Detect companies, stock symbols, monetary amounts
5. Challenges
- Ambiguity: “Apple” could be a fruit or a company
- Context sensitivity: Same word can belong to different entities in different contexts
- Multilingual text: Need language-specific models
6. Summary
NER = Automatic identification and classification of key entities in text. It is crucial for structured understanding of unstructured data.