Preface:The previous article summarized some basic knowledge of natural language processing, and next we will organize a simple introduction to some practical applications of NLP, including information extraction, automatic summarization, question answering systems, etc. The focus of this article is on the question answering system. Similar to the previous article, the content of this article is mainly excerpted from the second part of the book “Multilingual Natural Language Processing”, only covering basic concepts. (Note: The Chinese version of this book was published in February 2015, so the research and viewpoints mentioned in the book refer to before 2015.)
1. Entity Detection and Tracking
Information Extraction (Information Extraction, IE) refers to the process of identifying and extracting useful textual information from natural language documents. The “usefulness” of the information is determined by the user and the application. For the input document, we often care about “who did what to whom, when, or for what reason (why)”. Obviously, the scope of information extraction can be arbitrarily broad and may sometimes require world knowledge. To simplify the problem, we will only focus on the following two sub-tasks:
1) Detect mentions from the document and identify their attributes; a mention refers to a text block that identifies a physical object (such as a person or an organization);
2) Group mentions that refer to the same object using entities; an entity is a collection of many mentions that refer to the same object.
These two sub-problems are crucial steps for document understanding, as they determine the important conceptual objects and their relationships in the discourse.
The first problem is called mention detection, which includes detecting the boundaries of a certain mention and selectively determining its semantic type (such as person or organization) and other attributes (such as name, noun, or pronoun). The second problem is called coreference resolution, which consolidates mentions that refer to the same entity into an equivalence class. Since solving these two problems allows us to identify entities and their attributes within a document, we also collectively refer to these two problems as “Entity Detection and Tracking”, abbreviated as EDT.
A certain mention can be a name, noun, or pronoun. For example:
“Ford President said he had no comment”.
This sentence contains three person mentions: Ford, President, and he. “Ford” is a name, “President” is a noun, and “he” is a pronoun. Clearly, President and Ford refer to the same person, so we say they belong to the same entity. However, due to limited context, it is ambiguous whether “he” refers to Ford the President. If taken out of context, Ford could also refer to an organization (Ford Motor Company), as in “Ford sold 10 million cars in the first quarter.” Just like many other issues in NLP, this ambiguity is a major difficulty faced by entity detection and tracking (EDT).
The most successful methods for mention detection and coreference resolution are data-driven statistical methods. For this method, the training dataset is manually annotated and can automatically learn statistical models from the data. The learned models can be applied to unknown documents. Compared to a rule-based system, statistical methods have many advantages:
1) Data-driven methods can quickly test different algorithms and features;
2) When new data becomes available through adding new training datasets, statistical systems can continuously improve;
3) Statistical systems can be easily transplanted to other languages.
From the perspective of system architecture, there are two types of EDT systems:
1) Cascade Systems. In this system, a mention detection component is followed by a coreference resolution component in series. The advantage of this architecture is that there is a clear boundary between the two subsystems, which can be developed and improved independently. For example, the mention detection system can be trained on one dataset, while the coreference resolution system can use a completely different dataset for training. Because of their separation, the system can easily identify and correct errors. The disadvantage of cascade structures is that these two problems are inherently closely linked but are solved in isolation.
2) Joint Systems. Another architecture is to solve both problems jointly. Joint solving means that the system will attempt to find the coreference chains while performing mention detection: it first assumes a mention and then looks for possible referents that have appeared before; in other words, the mention detection operation and the coreference resolution operation are interleaved. The advantage of this architecture is that it has “global” optimal system parameters, but its algorithm’s time and space complexity is usually much larger than that of the corresponding cascade system.
Cascade systems have been proven to perform very well in practice.
In certain natural language applications, it is not enough to obtain some independent mentions in a document. For example, based on the following paragraph:
“John F. Kennedy was the thirty-fifth President of the United States. He was assassinated on Friday, November 22, 1963.”
How do we answer the question “When was John F. Kennedy assassinated?”
John F. Kennedy is referred to by the pronoun “he”, so the answer can be found in the latter sentence. Therefore, to correctly answer this question, it is crucial to know that “he” refers to “John F. Kennedy”. The process of linking those mentions pointing to the same physical object to an entity is called coreference resolution.
Mention detection and coreference resolution can be solved in a joint manner because some mention determinations require coreference resolution to guide (and vice versa). Although in practice, the complexity of joint systems often obscures their advantages, joint systems still deserve more in-depth research in the future. The outputs of mention detection and coreference resolution lay the foundation for future in-depth analysis, such as relation and event extraction. It can also be directly used in some applications, such as question answering systems or machine translation systems.
2. Relations and Events
Words are everywhere in the world, and these words are increasingly stored in digital form. As of 2008, there were over 1 trillion distinct webpages in the world, and this number is growing at a rate of more than 1 billion each day, with each webpage containing at least some text. Natural language text is full of ambiguities but rich in various information, and eliminating uncertainties through a vast amount of information can somewhat resolve the ambiguity issue.
With so many electronic text documents, there is an increasing demand for computer systems that can process these natural language texts. Such computer systems need to automatically consolidate free-form, ambiguous texts into more accurate and compact structured representations and be able to access and process large volumes of documents more efficiently. For example, a company needs to track user feedback on its products; a politician needs to understand his constituents’ attitudes toward his views; an analyst needs to record the behavior and discourse of specific individuals, groups, or organizations.
To enable computers to come close to fully understanding the content of natural language texts, a model must include syntax, semantics, pragmatics, and world knowledge, with appropriately rich meaning expressions. However, relative to this complete level of understanding, we explore a more limited issue: that is, extracting relevant information to fill a “database” of facts related to each specific task. More specifically, we define the problem as finding all relevant entities in a corpus text, identifying all relevant attributes of these entities, and identifying all relevant relationships between entities, and storing this information in a structured manner.
Intuitively, once the facts are filled, our database can answer the following types of questions through very simple database queries:
1) Who are the people or entities mentioned in a specific document or document set?
2) How many employees are in a particular company, and what are their names?
3) What are the relationships between some people or entities?
4) What events are mentioned in a document or a series of documents?
5) When did certain events occur?
6) Where did certain types of events take place?
Entity detection and tracking study how to identify and find mention types in text and how to find coreferent content. Here, the primary study is how to find semantic relationships between entities. Systems capable of handling this task are typically referred to as relation extraction systems.
The term relation extraction has several meanings in the natural language processing literature. Broadly speaking, we can distinguish two main lines of research in relation extraction, the first primarily involves three special types of relation extraction:
1) Extracting relations associated with lexical ontologies, such as part-whole relations, hierarchical relations, and manner relations;
2) Extracting essentially similar relations, such as discovering that verb 1 and verb 2 express the same concept, but verb 1 is stronger;
3) Finding similar premises, i.e., identifying that the action represented by verb 1 is a prerequisite for the action represented by verb 2.
The second line of research addresses identifying more general semantic connections between potentially heterogeneous entities, such as employment relations between people and companies, relationships between diseases and people that lead to death, or the relationship between an owner of an entity (such as a company) and another owner.
Suppose we need to build a multilingual system that can identify cases where PERSON entities are described as owners of other entities (owner). Such and many other types of semantic relations are often expressed in a sentence, so the most common research approach in the literature is to establish a system to find intra-sentence relations. In this regard, we hope to construct a system that can analyze already identified entity mentions and, when a pair of mentions exists, the system can identify the “ownership” type of relationship mention between the two.
Furthermore, we want a system that can identify relationships between entities without considering whether the two entities are mentioned in the same sentence. However, here we assume that mentions of two entities within the same sentence serve as evidence of their relationship, even if one or both entities appear in pronoun form. For example, “he owns it”, where “he” refers to a PERSON entity, and “it” refers to the company owned by the PERSON entity.
In fact, we have seen a relation extraction system—coreference resolution system, which can find the relationship of “the same entity” between coreferent entity mentions in a document. But how do we handle relationships involving more than two entities? When the relationship involves a change in the state of one or more entities, we call it an event.
In a broad sense, events refer to any change in state in the world that can be described using natural language text. Event extraction refers to the process of extracting a structured representation of that state change using any algorithm, especially including the entities involved. Typically, a word, usually a verb, indicates the change in state, and the arguments of the verb are usually the entities participating in the event. Therefore, events can be viewed as a generalization of relations, a collection of relations between entities and a single trigger (typically still a verb).
Event extraction systems can identify a set of entities that have state changes. For example, in the sentence “Mary bought apples for $20”, there is the event “bought” and three entities “Mary”, “apples”, “$20”. By using predicate calculus (predicate-argument structure), we can represent such events with ternary predicates like bought(Mary, apples, $20), or binary predicate pairs: bought(Mary, apples) and paid(Mary, $20).
The main goal of relation and event extraction is to represent the information in the text in a structured manner so that it can enter a database for searching, which is easier and more efficient than simply performing keyword searches.
3. Multilingual Automatic Summarization
Automatic summarization can be divided into single-document and multi-document types. Summaries may be driven by specific queries or may provide the main content of a document (or document set); different purposes lead to different types of summaries. For example, informative summaries are a compressed version of important facts in the input text (such as an abstract of a journal paper). Summaries may also merely indicate the topics in the input text without providing further details (such as keywords in scientific papers).
Another type of summary appears in the form of commentary, such a commentary summary generally provides viewpoints by comparing documents similar to the input text. Detailed summaries provide more details about a large document or multiple related documents, which can help navigate such documents or related documents, such as Wikipedia.
More fundamentally, we can divide summarization into document excerpts or document summaries through different implementations of automatic summarization. Excerpts summarize by extracting the most important parts of the document, which may also include a small amount of secondary parts. Summaries describe a summary of the question-answering content and do not necessarily directly contain the original sentences of the document. Most current automatic summarization systems are implemented through excerpts, but some systems attempt to generate summaries or summarize sentences to retain important parts within a sentence (or more content).
Recent research details include summarization of book catalog information, update summaries (i.e., only reporting the latest changes in developing events), or guided summarization, aiming to extract semantic information from source documents based on the type of document (e.g., accidents or natural disasters).
Multilingual automatic summarization inherits the characteristics and challenges of monolingual automatic summarization and adds an additional dimension. By rough definition, multilingual automatic summarization involves automatic text summarization that involves more than one language. Specifically, a summarization system can process one source language (e.g., Arabic) and present the summary result in a target language (e.g., Chinese). We refer to this specific multilingual summarization as cross-linguistic summarization.
There is also a more complex summarization called cross-language summarization, where the source language consists of multiple languages, and the summary result is presented in one (or more) target language(s). Cross-language summarization is a more challenging task because it needs to integrate multiple source documents from different languages. All multilingual summarization, whether involving two or more source languages or target languages, faces many issues.
The first issue is cross-document coreference resolution, as named entities are often translated into different results in different languages; the summarization system must normalize these variants and map them to the same entity. Another issue is that different languages often use different discourse structures. The discourse relations of different languages may differ, so generating coherent summaries in the target language is also challenging. An even more complex problem is how to summarize relevant concepts; for example, summarizing legal matters in different languages is extremely difficult.
Many of the above issues already exist in monolingual automatic summarization, but due to the different references, discourse structures, and concepts in different languages, these problems are exacerbated in multilingual summarization.
Traditional Automatic Summarization Methods
Automatic summarization aims to meet users’ information needs by extracting and modifying materials from the source document to create a shorter text that reflects the content of the source document. If the short text is extracted word for word (or with minimal modifications), such a summary is called an excerpt; if the short text is a summary of the source document’s content (extracting the essence of the content), such a summary is called a summary.
A large amount of research focuses on how to address user needs, leading to different types of summarization tasks, such as multi-document automatic summarization and query-based automatic summarization. Multi-document automatic summarization summarizes from multiple documents involving the same topic, while query-based automatic summarization summarizes based on user queries without providing general-purpose summaries; query-based summarization can be based on single documents or multiple documents.
In general, each automatic summarization system can be divided into three steps:
1) Analysis. Analyze the source text to generate some internal representations. This representation can be a set of feature vectors (e.g., counts of the most common words in sentences) or a logical representation describing its content. For a cross-linguistic system, this part is particularly important, as this representation must have some compatibility across different languages.
2) Transformation. Prune and compress the internal representation generated in the first step (e.g., ranking sentences based on a scoring function). Similarly, the transformation may be language-dependent depending on how the internal representation is selected.
3) Implementation. The purpose of summarization is to generate a text that is shorter than the source document. A simple method is to output the n highest-scoring sentences based on the scoring function, but to generate a coherent summary, other operations are essential (e.g., coreference resolution). If multilingual automatic summarization does not address multilingual issues in the early stages, it must use a machine translation component to express the summary in the target language.
Research on automatic summarization can be traced back to the late 1950s with Luhn’s work. Luhn investigated the impact of common terms in sentences and proposed a scoring function to calculate the score of each sentence in a document. Other early automatic summarization systems primarily extracted important sentences based on surface features. Generally speaking, sentences at the beginning (or end) of a document are often very important. Therefore, the position of sentences in the document is often a good feature to determine the importance of a sentence, as authors tend to place important sentences in prominent positions in the article.
Many early methods, as well as some recent approaches, often use the following features to extract sentences:
1) Indicative phrases such as “in conclusion”;
2) Distribution of terms;
3) Overlap of words with the title;
4) Sentence positions in the text, paragraphs, etc.
These general features can also be easily applied to multilingual automatic summarization. The distribution of terms and positions are mostly language-independent features, where positional information is genre-related. For example, important sentences in news articles are generally at the beginning, while legal texts often summarize information at the end. Summaries generated based on feature extraction methods are not always coherent, so to address this issue, automatic summarization systems will incorporate discourse theory that predicts coherence.
Challenges of Multilingual Automatic Summarization
1.Token Segmentation is the first issue we should overcome when building multilingual automatic summarization systems due to different languages having different representations of word boundaries. For example, English uses spaces and punctuation to delineate tokens, but other languages like Chinese require a more complex tokenizer to extract tokens from a continuous input since there are no spaces between them. In English, a token is a word, but in different languages, this is not necessarily the case. Other languages (such as Arabic) require handling very rich grammatical morphology, thus requiring fine handling at the morpheme level.
2.Coreference Resolution. Identifying coreferential relationships (e.g., pronouns, discourse markers, restrictive noun phrases) can help summarization to be more cohesive. Some techniques already exist in monolingual automatic summarization, but multilingual summarization faces many challenges, such as names being written in different forms across different languages, and discourse markers having different semantics.
3.Discourse Structure. Recognizing document structure helps improve the coherence of summaries. However, different languages have different structural expressions for texts.
4.Machine Translation. When designing a multilingual automatic summarization system, designers must consider when machine translation should be used in the system.
Evaluation of Automatic Summarization
Determining the quality of summaries generated by an automatic summarization system is also a major challenge in the field of automatic summarization research. Evaluation methods for summarization can be divided into two categories: external evaluation and internal evaluation. The basic idea of external evaluation methods is to indirectly evaluate the summarization system through the performance metrics of completing other information processing tasks using the automatic summarization system. Internal evaluation assesses the quality of summaries directly and can be used at various stages of the summary development cycle.
Internal evaluation generally evaluates the coverage of the reviewed summary compared to the reference summary. Reference summaries are usually provided manually and serve as standard summaries for comparison. If evaluating an automatic summarization system, the reviewed summary is the summary produced by the system; if we are analyzing the quality of reference summaries, it can also be a manually written summary.
According to whether the evaluation of the summarization system is conducted manually or automatically, evaluation methods can also be divided into manual evaluation and automatic evaluation methods. If evaluation is easy to implement, manage, and does not need to be repeated, then manual evaluation is the better choice. However, if human resources are limited, an automatic evaluation method should be adopted.
4. Question Answering Systems
Question answering systems can retrieve the answers users need from an information repository. Most traditional information retrieval systems adopt a keyword search paradigm. Compared to simply using keyword searches, question answering systems provide a more intuitive way to ask questions in natural language, making the expression clearer. In addition, while information retrieval systems respond to user queries in the form of articles or documents, question answering systems can provide answers that are both accurate and relevant to the topic.
Recently, the hottest types of questions researched are fact-based questions, such as questions about named entities that seek precise answers (e.g., What is the capital of Turkey?). List-based questions find a list of answers to such factual questions (e.g., Which countries are included in NATO?). Researchers attempt to handle questions with complex answers, such as definition questions, relation questions, and opinion questions.
Among them, definition questions require the system to provide information on a specific topic, including biographies of individuals, such as Who is Einstein? Relation questions, such as What is the relationship between the Taliban and Al-Qaeda? Opinion questions, such as What do people like about IKEA? Here, we mainly discuss methods used for fact-based question answering systems, which can also be adapted to answer list-based questions. Fact-based questions are suitable for illustrating the principles of modern question answering systems, and the algorithmic solutions and evaluation methods for fact-based question answering systems are more mature compared to systems that solve questions with complex answers.
The main challenges of question answering systems are the flexibility, richness, and ambiguity of natural language, which often lead to mismatches between the information contained in questions and the answers in the text. Although simple keyword matching can successfully identify the correct answers to many questions, the ability to possess common sense and logical reasoning is essential, as developed in text entailment recognition (RTE) tasks.
Additional challenges arise from temporal expressions and statements, which are time-sensitive. When answering questions like “Which car manufacturer has been owned by VW since 1998?” difficulties may arise. For example, a newspaper article from 1998 only contains the short text “Volkswagen today announced the acquisition of Bentley.” To identify the correct answer, the question answering system must clarify that Volkswagen and VW refer to the same entity (Volkswagen), and that Bentley is a car manufacturer. It also needs to infer that acquisition implies ownership, and that the temporal expression today is consistent with 1998.
Furthermore, it may not be possible to find the answer to a question within a single document; in such cases, it becomes necessary to combine information from multiple resources. For example, for the question: Where is Sony’s headquarters located? Although the document does not explicitly state that Sony’s headquarters is in Japan, two independent documents may mention that the headquarters is in Tokyo, and Tokyo is a city in Japan. Another situation is to break a question down into multiple sub-questions, with the final answer composed of the answers to these sub-questions. For example, “Which country won both the World Cup and the European Championship?” The answer to this question is the intersection of the results of these two competitions.
Architecture
In recent years, although many QA architectures have been adopted, most QA systems are based on a core pipeline that includes components such as question analysis, query generation, search, candidate answer generation, and answer scoring. The basic process is as follows:
1) Question Analysis component uses various techniques to extract syntactic and semantic information from the question, including answer type classification, syntactic and semantic analysis, and named entity recognition;
2) In the Query Generation stage, the information is transformed into a set of search queries, with varying degrees of query expansion; these queries are passed to the Search component to retrieve the required information from the knowledge source;
3) Search results are processed by the Candidate Answer Generation component to derive or extract candidate answers at the desired granularity (such as for fact-based questions or definition questions);
4) The Answer Scoring component evaluates the answers obtained from the previous step and typically merges similar candidate answers; at this stage, the knowledge source can be reused to provide evidence for each candidate answer, resulting in a series of answers ranked by confidence.
Here is an example to illustrate how the typical architecture mentioned above processes a sentence:
1) Input the question in text format: “Which computer scientist invented the smiley?”
2) The question analysis component determines that this question is looking for an answer of type computer scientist, extracting keywords invented and smiley;
3) The query generation component builds a query for the search engine based on the answer type and extracted keywords;
4) With the query constructed in the previous step, the search component retrieves paragraphs from the text corpus (e.g., Web): “The two original text smileys were invented on September 19, 1982 by Scott E. Fahlman at Carnegie Mellon.”
5) In the candidate answer generation stage, named entities are extracted as candidate answers:
“September 19, 1982;
“Scott E. Fahlman;
“Carnegie Mellon;
6) Finally, the answer scoring component uses various features to estimate the confidence score for each candidate answer, including retrieval ranking, the number of candidate occurrences in search results, and whether they match the predicted answer category; we get a score:
“Scott E. Fahlman (score: 0.9);
“Carnegie Mellon (score: 0.4);
“September 19, 1982 (score: 0.3);
7) The highest-scoring candidate “Scott E. Fahlman” is returned as the most likely answer.
Most question answering systems generally follow this typical architecture, although some systems introduce variations, including additional components or altering the flow between system components. START question answering systems differ from typical QA architectures in that they decompose complex questions and answer them in a nested manner. For example, Where was the 20th U.S. president born? can be answered by first retrieving the president’s name and then using this information to find his birthplace.
Modern question answering systems often employ a multi-strategy approach to answer a question, where several independent algorithms run in parallel and then combine their results. This parallel approach can be computationally expensive, but it has proven to be very effective, as multiple components can complement and reinforce each other.
QA systems typically rely on existing information retrieval engines to search local document sets or the web for relevant documents or paragraphs. Thus, the question analysis component and the query generation component can be seen as the preprocessing stage that transforms natural language questions into abstract sets of queries that can be used for potential search engines. The candidate generation component and the answer scoring component can be viewed as the post-processing stage that generates accurate answers from the search results.
Note that there are some systems that do not require traditional text searches. For example, Clifton and Teahan automatically extract knowledge relations from text resources and store them in a knowledge base; more recently, the Wolfman Alpha system uses a human-built knowledge base as its answer source. In operation, the QA system matches questions with items in the knowledge base, rather than searching unstructured text.
Cross-Language Question Answering
In cross-language question answering, the language of the question differs from the language of the knowledge source. When extending a monolingual system to the task of cross-language systems, developers can either translate the source document into the language of the question or translate the question and keywords into the language of the source document. The debate over which method is more effective originates from the information retrieval community, but it is difficult to provide conclusive evidence. Because each translation direction requires different machine translation systems, the performance differences between the two methods may stem from the methods themselves or from the differences in machine translation systems.
Both translation directions have been successfully implemented in QA systems. General evidence supporting source translation is the robustness against machine translation errors. If important keywords in the question have not been accurately translated, it will be impossible to select the correct answer. On the other hand, the source text often includes multiple related paragraphs; if one of them is translated correctly, that is enough. Additionally, source text translation can be done offline during preprocessing, thus not incurring extra costs at runtime and not requiring modifications to the steps in the typical QA architecture.
Moreover, if the source text is large or needs to support questions in multiple languages, offline translation may be relatively expensive. Therefore, researchers may need to adopt more efficient but less accurate machine translation algorithms. Furthermore, source translation is only feasible when the source text can be stored and indexed locally, which does not apply to Web searches. Additionally, as machine translation systems improve, source translations also need to be updated.
For some QA tasks, it is feasible to translate the entire source text. However, it is more common to translate the question during question analysis or to translate keywords and phrases extracted from the question. When translating the entire question, ambiguity between words can be eliminated based on context. Additionally, the syntactic mapping of the question can select syntactically similar sentences in the source language’s corpus. On the other hand, if the question is complex, it can be challenging to find an accurate and syntactically correct translation; in such cases, translating individual keywords may be more effective.
Another challenge arises from the translation of proper nouns in cross-language QA systems between European and Asian languages, regardless of the translation method used. For example, a British person’s name may have completely different translations in Japanese, which can be transcribed using the katakana writing system or written in romaji in Japanese text. Furthermore, if the name is transcribed in katakana, there are often multiple spelling variations. This ambiguity can be resolved by considering multiple translations when retrieving and matching relevant texts or by replacing expressions in the source text that refer to the same entity with a canonical form during the preprocessing stage. The latter method avoids runtime computational overhead but requires identifying expressions that refer to the same entity and mapping them to a unique representation at runtime.
Current and Future Challenges
We have seen that question answering systems typically use simple statistical models and heuristic methods to extract candidate answers and rank them. This technique is suitable for situations where the source text is redundant and contains many answer instances; it is common to use this technique when the source text is large or the question is about current hot topics. However, if the source text is not redundant, more complex query expansion techniques may be needed to retrieve documents and paragraphs containing answers, and deeper NLP and reasoning techniques are essential for identifying answers and finding justifications.
In extreme examples of semantic matching and text entailment, there may be only one paragraph containing the answer in the entire source corpus, and the question answering system must determine whether it entails the answer. Usually, the semantic relationship between questions and text paragraphs is not obvious, and can only be revealed through ontology and relevance calculations for term matching, structural matching based on precise syntactic and semantic analysis trees, and logical reasoning using world knowledge.
Although techniques for extracting answers from knowledge sources are applicable to most fact-based questions, when answers are not explicitly present in the resources but must be inferred from other statements, these techniques may fail. For example, resolving temporal expressions relative to the publication date of an article (e.g., yesterday, the government announced…) or performing unit conversions, summing values (e.g., What is the combined net worth of the top ten billionaires in dollars?) is essential. Pure candidate extraction techniques are insufficient for questions with complex answers, as it is crucial to compose a natural, coherent, and non-redundant paragraph, and answers need to be inferred from facts across multiple documents.
For a question answering system, generating and scoring candidate answers is far from sufficient; reliable confidence estimates for the top-ranked answers are also needed. If the best existing answer is unlikely to be correct, it may be better to inform the user that the question cannot be answered. Incorrect answers can undermine user trust in the reliability of document systems, thus affecting the system’s usefulness to users.
Confidence estimates are also very important for list-based questions, as the number of correct answers is unpredictable and depends on how many instances the system returns. In current systems, confidence estimates are often inconsistent across questions, depending on various factors such as answer type, redundancy of resources, or length of the question.
Cross-language question answering systems, such as those evaluated in NTCIR and CLEF, have taken the first steps toward mature multilingual systems. Current systems can translate questions into the language of the information source and produce answers in that language. However, the answers are not translated into the language used for questioning. Users who are not proficient in the source text language would hope that the system could achieve: users ask questions in their language, the system searches knowledge sources in various languages, and the returned answers are in the language used for questioning, so that they can understand.
The evaluation forums such as TREC, NTCIR, and CLEF have undoubtedly driven the technological advancement of QA, but they also lead to specific solutions for particular relevant tasks that often cannot easily adapt to new domains and real-world applications. To promote the practical application of QA technology, future research should focus on general question answering algorithms and techniques, so that they can adapt more quickly to new tasks and achieve high performance in different domains.
While most research to date has focused on fact-based questions and list-based questions, questions with complex answers, such as definition, relation, and opinion questions, have recently received more attention. However, question answering systems are not as effective in providing complex answers. Improvements in question answering algorithms and consistent automatic evaluation methods are necessary to enhance the performance of complex question answering systems to meet practical application requirements. Especially challenging questions include: how (how) and why (why) questions that require finding explanations or justifications; yes/no questions that require the system to determine whether the joint knowledge of available information sources entails a hypothesis. Effective algorithms for handling such questions still need to be developed.
Source: Jiang Jiang Diao
Disclaimer
The article and images are sourced from the internet, and the content is for learning and communication purposes only. Copyright belongs to the original author or institution. Some articles may not have been able to find the original author when pushed; if there are copyright issues, please email [email protected].