We all rely on search engines to navigate the massive amount of online information published every day. Modern search engines not only retrieve a list of pages relevant to a query but often try to directly answer our questions by analyzing the content of those pages. One area they currently struggle at, however, is multi-hop Question Answering that requires reasoning with information taken from multiple documents to arrive at the answer. For example, suppose that we want to find out What is the size of the COVID-19 virus?. Systems that try to answer this question first need to identify the virus responsible for COVID-19 (SARS-CoV-2) and then look for the size of that virus (50-200 nm). These two pieces of information may be mentioned in different documents, making the task challenging for conventional approaches. In this blog post, based on our recent ICLR paper, we describe a fast, end-to-end trainable model for answering such multi-hop questions.
Most of the existing work on question answering (QA) falls in two broad categories. Retrieve + Read systems take the documents returned by standard search engines and run a deep neural network over them to find a span of text that answers the question. However, for multi-hop questions all the relevant documents may not be retrieved by the search engine in the first place, limiting the abilities of these systems. In the example above, the document that talks about the size of SARS-CoV-2 does not share any words with the question and might be missed out.
Another line of work has looked at Knowledge-Based QA, where documents are first converted into a knowledge base (KB) by running information extraction pipelines. A KB is a graphical data structure whose nodes and edges correspond to real-world entities and the relationships between those entities, respectively. Given a sufficiently rich KB, multi-hop questions can be efficiently answered by traversing its edges. However, constructing KBs is an expensive, error-prone and time-consuming process: new information (such as about COVID) first appears on the web in textual forms and adding it to KBs involves a delay. Furthermore, not all information can be expressed using the predefined set of relations allowed by a KB, and hence KBs are usually incomplete.
Motivated by the aforementioned limitations, we explore whether we can directly treat a text corpus as a KB for answering multi-hop questions. At a high level, to achieve this goal we first convert the corpus into a graph-structure similar to a KB while keeping the documents in their original form. We then define a fast, differentiable operation for traversing the edges in the graph, which we call relation following. Given a question, we finally decompose it into a sequence of relations that informs the graph traversal and leads to the answer. Our design ensures that the traversal operation is differentiable, thereby allowing end-to-end training of the entire system. We also introduce an efficient implementation of the traversal operation that scales to the entire Wikipedia in milliseconds!