Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Author/IT Duck

Image/IT Duck, Internet

Introduction

What is a composite event? Composite events include conditional events, causal events, sequential events, and inverted events.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Many real NLP projects require the extraction of composite events, such as event extraction in knowledge graphs to form event-relationship graphs; event extraction in intelligent chat dialogues to identify user intentions. As a fundamental module in NLP, there is still much to research regarding composite event extraction.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

In this article, we mainly introduce a rule-based method for extracting Chinese events, with source code contributed by netizens. The source code here has been optimized to separate data from code, facilitating the later expansion of event extraction rules while simplifying the code for better understanding.

The best way to learn is to read excellent source code, and the best test of results is to modify the source code and integrate your ideas into the code. Let’s learn together with Duck on how to develop this project.

Technical Challenges

Reading and storing JSON files, as well as handling some encoding issues.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Reading JSON files

When saving files, use the dump function and set the “ensure_ascii” parameter to False, so that the saved file will not have Chinese character encoding issues.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Saving JSON files

How to extract composite events? Here we use regular expressions, which is also the most commonly used rule-based method.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Since there are hundreds of connecting keywords for event sentence patterns and thousands of combinations of connecting words, we must design a regular expression that can match all combinations. Here we design a function to generate dynamic regular expressions, manually organizing fixed connecting words (details can be found in the data file data.json), and then automatically generating various forms through the program.

We treat the above “not only” and “but also” as variables, replacing them with placeholders in Python. By reading the connecting words from the file and iterating to assign values to “pre” and “pos”, we can generate all regular expressions.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

The article’s code is sourced from open-source code, with optimizations and improvements made. We separate data from code and introduce the strategy pattern from design patterns. The strategy pattern primarily addresses the maintenance issues caused by using “if…else…” when there are multiple similar algorithms. It avoids excessive use of multiple conditional judgments and has good extensibility.

If your code has a lot of “if….else….”, consider whether you can use the strategy pattern.

Overall Project Framework Diagram

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

With the overall project framework diagram, you can better grasp the development progress of the entire project.

Detailed Code

Import the necessary Python packages, where re is a regular expression module used for pattern matching.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

All event connecting words are written into data.json, with the format shown in the image below. The advantage of writing to a JSON file is that it facilitates future expansions. Adding event connecting words and event types can be easily implemented without modifying the source code.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Encapsulate the extraction of composite events into a class; this is the basic writing of the class initialization function.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Data loading module, reading data from data.json and organizing the data format.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Based on the data obtained in the data loading module, generate all regular expressions.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Using regular expressions, match and extract events from each input sentence.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Here we split the input text and perform event extraction on each sentence.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Establish a main function to test the written class.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Finally, output the results; you can modify them according to your desired output format.

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Conclusion

This project involves regular expressions, file reading, and the strategy pattern in design patterns. These are basic programming knowledge that needs to be continuously accumulated in daily programming; it cannot be achieved overnight.

Learning Python doesn’t require the price of a cup of milk tea, just a follow from you. If you find the article helpful, remember to give it a like and share it. If you want to obtain the source code, follow me and send a private message: Python Event Extraction, and I will send you the source code. Lastly, thank you all for reading, and I wish you a happy life.

This article is original by IT Duck , welcome to follow, and let’s gain knowledge together!

Long press the QR code

To get more exciting content

IT Duck

Introduction to NLP: Rule-Based Chinese Composite Event Extraction with Python Source Code

Leave a Comment