papers

A case for XTM 3.0

This was written for TMRA'08 together with Rani Pinchuk and Xuân Baldauf.

Improvements to XTM 2.0 are suggested in this paper. First, a set of criteria is defined for evaluating those improvements. It is followed by the suggestions
themselves: align element names with the names used in TMDM, reduce the number of elements by introducing mixed content and using attributes whenever it is possible. Finally, some relevant irregularities are discussed.

See the paper in the attachment.

Tags: 

Learning foci for Question Answering over Topic Maps

The paper on Question Answering I have cowritten with Rani Pinchuk and Tiphaine Dalmas for ACL-IJCNLP'09.

Abstract

This paper introduces the concepts of asking point and
expected answer type as variations of the question focus.
They are of particular importance for QA over semi-structured data, as
represented by Topic Maps, OWL or custom XML formats. We describe an approach
to the identification of the question focus from questions asked to a
Question Answering system over Topic Maps by extracting the asking
point
and falling back to the expected answer type when necessary.
We use known machine learning techniques for expected answer type
extraction and we implement a novel approach to the asking point
extraction. We also provide a mathematical model to predict the performance
of the system.

See the paper in the attachment.

Tags: 

Occasional XSLT for Experienced Software Developers

FIrst published in 2004 in DevX

Although using XSLT to process XML is increasingly common, most developers still use it only occasionally—and often treat it as just another procedural language. But that's not the best way to use XSLT. Learn how to simplify and improve your XSLT processing using event-driven and declarative techniques.

XML appears in some form in most modern applications—and often needs to be transformed from one form into another: merged, split, massaged, or simply reformatted into HTML. In most cases, it's far more robust and efficient to use XSLT to perform such transformations than to use common programming languages such as Java, VB.NET, or C#. But because XSLT is an add-on rather than a core language, most developers use XSLT only occasionally, and have neither time nor resources to dive into the peculiarities of XSLT development or to explore the paradigms of functional and flow-driven programming that efficient use of XSLT requires.

Such occasional use carries the danger of abusing programming techniques suitable for mainstream languages such as Java, C and Python, but that can lead to disastrous results when applied to XSLT.

However, you can avoid the problems of occasional use by studying a few applications of different well-known programming problems to an XSLT programming task through this set of simple, thoroughly explained exercises.

Некоторые методы автоматического анализа естественного языка, используемые в промышленных продуктах

Пример текста, в котором каждому слову поставлена в соответствие часть речи.

Description of state-of-the-art Natural Language Processing Technologies. Topics concerned include POS tagging, Text Parsing, Automatic Text Summarization. A lot of information on successful linguistic enterprises and research groups is also provided.

Введение

Исследования и разработки в области автоматической обработки текста в Европе и США привлекают внимание крупнейших частных фирм и государственных организаций самого высокого уровня. Европейский союз уже несколько лет координирует различные программы в области автоматической обработки текста. Например, Human Language Technology Sector of the Information Society Technologies (IST) Programme 1998 - 2000. Один из наиболее интересных проектов в рамках данной программы SPARKLE (Shallow PARsing and Knowledge Extraction for Language Engeneering). В числе его участников - Dimler-Benz, Xerox Research Centre in Europe и Cambridge University Computer Laboratory. Цель проекта боздание частичных синтаксических анализаторов для основных языков Европейского союза.

В США с 1991 до осени 1998 года существовал проект TIPSTER, организованный DARPA, Департаментом Обороны и ЦРУ совместно с Национальным Институтом Стандартов и Технологий и Центром военно-воздушных и военно-морских вооружений (SPAWAR). В работе консультативного совета программы участвовали также ФБР, Национальный Научный Фонд и некоторые другие организации. Основной целью программы было сравнение и оценка результатов работы различных поисковых систем и систем реферирования.

Необходимо отметить, что такие задачи как распознование и генерации речи, создание поисковых систем до настоящего времени решаются с минимальным участием лингвистов. Это обусловлено использованием при решении вышеупомянутых задач в основном статистических методов.