Communication, books, messages, text messages, songs, movies… It is hard to imagine a world without information. Every day we face a vast amount of text and speech. What natural language processing aims to do is convert various human languages into standardized computer languages, ultimately achieving human-computer interaction.
01
What Is Natural Language Processing?
Natural Language Processing (NLP) can be divided into two parts: “natural language” and “processing”. Let’s first look at natural language. Unlike computer languages, natural languages are a means of information exchange formed during human development, including both spoken and written forms, reflecting human thought and expressed in natural language. All languages in the world belong to natural languages, including Chinese, English, French, etc.
Now let’s look at “processing”. If it were just manual processing, there would already be a specialized linguistics field to study it, and there would be no need to emphasize “natural”. Therefore, this “processing” must be computer processing. However, computers are not human and cannot process text like humans; they need their own processing methods. Thus, natural language processing, simply put, is when a computer accepts user input in natural language format and processes it internally through algorithms defined by humans to simulate human understanding of natural language and return the expected results to the user. Just as machines liberate human hands, the purpose of natural language processing is to use computers to replace manual processing of large-scale natural language information. It is an interdisciplinary field involving artificial intelligence, computer science, and information engineering, encompassing knowledge from statistics, linguistics, and more. Since language is a proof of human thought, natural language processing is considered the pinnacle of artificial intelligence, often referred to as the “crown jewel of artificial intelligence”.
02
What Can It Do For Us?
Information Retrieval
Information Retrieval refers to the process and techniques of organizing information in a certain way and finding relevant information based on the needs of information users. The goal of information retrieval is to accurately, promptly, and comprehensively obtain the required information.
Machine Translation
Machine translation, also known as automatic translation, is the process of using computers to convert one natural language (source language) into another natural language (target language).
Automated Question Answering Systems
Automated question answering refers to the task of automatically answering questions posed by users to meet their knowledge needs. When answering user questions, an automated question answering system must first correctly understand the user’s question, extract key information from it, and retrieve and match answers from an existing corpus or knowledge base to provide feedback to the user.
Speech Recognition
Speech recognition technology allows machines to convert speech signals into corresponding text or commands through recognition and understanding processes. The fields involved in speech recognition technology include: signal processing, pattern recognition, probability theory and information theory, vocal mechanism and auditory mechanism, artificial intelligence, and more.
03
Basic Technologies
Named Entity Recognition: Refers to identifying entities in a text that have specific meanings, mainly including names of people, places, organizations, and proper nouns.
Coreference Resolution: Refers to resolving pronouns in the text, such as “he” and “this,” to their corresponding entities.
Keyword Extraction: The process, techniques, and methods of automatically extracting words or phrases that reflect the theme of the text.
Word Vectors and Word Embeddings: Mapping words into a low-dimensional space while preserving the relationships between words.
Text Generation: Given specific text input, generating the required text, mainly applied in text summarization, dialogue systems, machine translation, question answering systems, and more.