Overview
Three entity extraction functions are available inmicroservice.py:
get_people_names()- Extracts person names using NLP taggingget_dates()- Identifies dates in various formatsfind_country()- Detects country names
Extracting People Names
Theget_people_names() function uses NLTK’s part-of-speech tagging to identify proper nouns that represent people’s names.
Function Reference
microservice.py:174-186
How It Works
Tokens tagged as “NNP” (proper noun, singular) are extracted. These typically represent names of people, places, or organizations.
Example Usage
The function returns all proper nouns, which may include place names. The API automatically filters out detected countries and dates from the names list.
Extracting Dates
Theget_dates() function uses regular expressions to identify dates in multiple formats.
Function Reference
microservice.py:122-160
Supported Date Formats
The function recognizes these date patterns:| Format | Example | Pattern |
|---|---|---|
| Numeric with separators | 25/12/2023, 25-12-2023 | dd/mm/yyyy, dd-mm-yyyy |
| Day Month Year | 25 December 2023 | dd MMM yyyy |
| Month-Day-Year | Mar-20-2009, Mar 20, 2009 | MMM dd, yyyy |
| Month Year | March 2009, Mar 2009 | MMM yyyy |
Example Usage
If no dates are found, the function returns ”-” instead of an empty string.
Extracting Countries
Thefind_country() function uses the pycountry library to detect country names in text.
Function Reference
microservice.py:162-172
How It Works
The function iterates through all countries in thepycountry database and checks if each country’s official name appears in the input text.
Example Usage
Supported Countries
The function recognizes all official country names from the ISO 3166-1 standard, including:- United States
- United Kingdom
- France
- Germany
- Japan
- And all other internationally recognized countries
Integration with API
All three entity extraction functions are automatically called when using the/textbased_emotion endpoint. The extracted entities are included in the response:
Advanced Usage
Customizing Date Patterns
Customizing Date Patterns
To add support for additional date formats, extend the
get_dates() function with new regex patterns:Improving Name Extraction Accuracy
Improving Name Extraction Accuracy
For better name extraction, consider:
- Using Named Entity Recognition (NER) models like spaCy instead of simple POS tagging
- Filtering out common non-name proper nouns
- Implementing context-based filtering
Handling Multiple Languages
Handling Multiple Languages
The current implementation is optimized for English text. To support other languages:
- Use language-specific NLTK models for tokenization and POS tagging
- Adjust regex patterns for date formats common in other locales
- The
pycountrylibrary supports country names in multiple languages
Next Steps
- Learn how to train custom models for better accuracy
- See the deployment guide to run the full application