Is the Open Domain Question Answering really "open"?

Although the popular description of the Open-Domain QA centers on the fact that the questions are not limited by a specific domain, this is not totally correct unless the term domain is used in the philosophical sense of a sphere of knowledge, influence, or activity, which is listed as 'sense #4' in the Merriam-Webster's dictionnary.

However, the term domain may also have a mathematically sounding meaning of a ...set of elements to which a mathematical or logical variable is limited; specifically : the set on which a function is defined. See 'sense #5' of the same dictionary. In this case, it can be said to be equivalent to the CS term data.

As evaluation conferences like TREC teach us, there is always a fixed amount of input data for a QA system. Which means that every QA system participating in TREC is to be called a closed-domain system if the term domain is used in its 5th sense.

There is however a major architectural difference between e.g. a system operating on a few hundreds documents, e.g. the famous LUNAR QA [Woods, 1973] and the systems that participate in the recent TREC contests. It is obvious is that their difference is due to the input data size. While it was possible to manually analyse and hard-code all the input data on the LUNAR project, none is expected even to do complex computations on many hundreds of gigabytes of data.

A new approach to the categorization of the QA systems can be based on the computational complexity of the preprocessing of input data. This is best described in the Big O notation:

O(1): constant, the QA system is completely independent of the size of input data that can be as large as necessary.
O(n): linear : the QA system can handle fairly large amounts of input data, its preprocessing time grows linearly with the size of the data.
O(xⁿ): exponential : the QA system can handle a minimum amount of input data, any addition of input data requires an ever increasing effort to process, thus limiting the size of the input data severely.

[Woods, 1973] : Woods, W. (1973). Progress in natural language understanding--An application to lunar geology. AFIPS Conference Proceedings, 42, 441-450.

Tags:

linguistics