Applications of Python Natural Language Processing in Office

Click the above to follow us!

Office automation is becoming increasingly popular, and everyone wants to improve work efficiency. Python is perfect for this! Today, let’s discuss how to use Python’s Natural Language Processing (NLP) technology to solve some common office problems. Don’t worry, I’ll explain it in the simplest terms so that you can start working right after!

Text Classification: Automatic Email Categorization

Receiving a large number of emails every day can be overwhelming. It would be great if we could automatically classify emails! Python’s NLP can easily handle this.

First, we need to prepare some pre-categorized emails as training data. Then we can use the NLTK library to process the text:


import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
# Assume we already have training data
emails = ["Hello, how are you?", "Meeting at 3pm", "Great deal on shoes!"]
labels = ["personal", "work", "spam"]
# Remove stop words
stop_words = set(stopwords.words('english'))
vect = CountVectorizer(stop_words=stop_words)
X = vect.fit_transform(emails)
# Train the model
clf = MultinomialNB()
clf.fit(X, labels)
# Predict a new email
new_email = "Don't forget our meeting at 2pm"
X_new = vect.transform([new_email])
predicted_label = clf.predict(X_new)[0]
print(f"This email likely belongs to: {predicted_label}")

Friendly reminder: Don’t forget to pip install nltk scikit-learn first!

This piece of code looks intimidating, but the principle is quite simple. It converts the email content into numbers that the computer can understand and then trains a classification model with these numbers. When a new email arrives, the model predicts which category it belongs to.

Text Summarization: Meeting Minutes Generator

After a meeting, writing the minutes can be a hassle! Using Python’s TextRank algorithm, you can quickly generate a decent summary.


import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
import networkx as nx
def generate_summary(text, num_sentences=3):
    sentences = sent_tokenize(text)
    words = word_tokenize(text.lower())
    stop_words = set(stopwords.words('english'))
    word_frequencies = {}
    for word in words:
        if word not in stop_words:
            if word not in word_frequencies.keys():
                word_frequencies[word] = 1
            else:
                word_frequencies[word] += 1
    sentence_scores = {}
    for sentence in sentences:
        for word in word_tokenize(sentence.lower()):
            if word in word_frequencies.keys():
                if len(sentence.split(' ')) < 30:
                    if sentence not in sentence_scores.keys():
                        sentence_scores[sentence] = word_frequencies[word]
                    else:
                        sentence_scores[sentence] += word_frequencies[word]
    summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:num_sentences]
    summary = ' '.join(summary_sentences)
    return summary
# Example usage
meeting_notes = """
In today's product meeting, we discussed the development plan for new features.
The team unanimously agreed to prioritize improvements to user experience.
Additionally, we discussed how to enhance system performance and stability.
The marketing department proposed some suggestions regarding product promotion.
The technical team indicated that they will begin addressing known bugs next week.
Finally, we determined the focus and timeline for the next stage of work.
"""
summary = generate_summary(meeting_notes)
print("Meeting minutes summary:")
print(summary)

What is the idea behind this algorithm? It identifies the most important sentences in the text. It calculates an importance score for each word, and based on the scores of the words in each sentence, it scores the sentences. Finally, it combines the highest-scoring sentences to create a summary!

Sentiment Analysis: Customer Feedback Classifier

What’s the scariest thing when making a product? Of course, not being able to understand user feedback! With sentiment analysis, you can quickly grasp the users’ voices.


from textblob import TextBlob
def analyze_sentiment(text):
    analysis = TextBlob(text)
    if analysis.sentiment.polarity > 0:
        return 'Positive'
    elif analysis.sentiment.polarity == 0:
        return 'Neutral'
    else:
        return 'Negative'
# Example usage
feedbacks = [
    "This product is amazing, I love it!",
    "It's okay, but it could be improved.",
    "It's a total waste of money, it’s useless."
]
for feedback in feedbacks:
    sentiment = analyze_sentiment(feedback)
    print(f"Feedback: {feedback}")
    print(f"Sentiment: {sentiment}\n")

Friendly reminder: Remember to first pip install textblob, then run python -m textblob.download_corpora to download the required corpora.

This code is super simple; it uses the TextBlob library to analyze the sentiment of the text. It assigns a score to each piece of text: positive scores indicate positive sentiment, negative scores indicate negative sentiment, and zero indicates neutral.

Named Entity Recognition: Automatic Key Information Extraction

Are your eyes tired from looking at contracts and reports? Use Named Entity Recognition (NER) to automatically extract important information!


import spacy
# Load the English model
nlp = spacy.load("en_core_web_sm")
def extract_entities(text):
    doc = nlp(text)
    entities = {ent.label_: ent.text for ent in doc.ents}
    return entities
# Example usage
contract_text = """
John Smith, CEO of Tech Solutions Inc., agrees to provide consulting services to
XYZ Corporation from January 1, 2023 to December 31, 2023 for a fee of $100,000.
"""
entities = extract_entities(contract_text)
print("Extracted key information:")
for entity_type, entity_value in entities.items():
    print(f"{entity_type}: {entity_value}")

Don’t forget to first pip install spacy, then run python -m spacy download en_core_web_sm to download the English model.

This piece of code uses the spaCy library, which can identify names, organizations, dates, amounts, and other information in the text. For lengthy contracts, this is a lifesaver!

Alright, that’s it for today’s sharing. Python’s NLP has many interesting applications in the office, such as automatically generating reports, intelligent Q&A systems, and more. If you’re interested, you can continue to explore! Remember, applying what you learn is key, so hurry up and put these skills to use in your work!

Text Classification: Automatic Email Categorization

Text Summarization: Meeting Minutes Generator

Sentiment Analysis: Customer Feedback Classifier

Named Entity Recognition: Automatic Key Information Extraction

Leave a Comment Cancel reply