Use version control software for lawmaking processes

Originally published in

As usual, the US is ahead of the rest of the world with the initiative that formalizes and makes available in a useful way all existing information pertaining to the country's legal system. It is important to change the lawmaking processes so that they create data in ways that are immediately useful by services like

The pilot project may combine ⅓ of software development and configuration activities with ⅔ of training, support and follow up.

The money is made in long-term projects maintenance and support projects that will be a joy to run once the shift is made.

Target customers: Ministry of Justice, DG Justice, standards bodies, big companies with evolved internal standards and ever failing formal verification and validation procedures run over tens if not hunderds of Microsoft Word documents stores on SharePoint servers.

A case for XTM 3.0

This was written for TMRA'08 together with Rani Pinchuk and Xuân Baldauf.

Improvements to XTM 2.0 are suggested in this paper. First, a set of criteria is defined for evaluating those improvements. It is followed by the suggestions themselves: align element names with the names used in TMDM, reduce the number of elements by introducing mixed content and using attributes whenever it is possible. Finally, some relevant irregularities are discussed.

See the paper in the attachment.


Learning foci for Question Answering over Topic Maps

The paper on Question Answering I have cowritten with Rani Pinchuk and Tiphaine Dalmas for ACL-IJCNLP'09.


This paper introduces the concepts of asking point and expected answer type as variations of the question focus. They are of particular importance for QA over semi-structured data, as represented by Topic Maps, OWL or custom XML formats. We describe an approach to the identification of the question focus from questions asked to a Question Answering system over Topic Maps by extracting the asking point and falling back to the expected answer type when necessary. We use known machine learning techniques for expected answer type extraction and we implement a novel approach to the asking point extraction. We also provide a mathematical model to predict the performance of the system.

See the paper in the attachment.


Occasional XSLT for Experienced Software Developers

FIrst published in 2004 in DevX

Although using XSLT to process XML is increasingly common, most developers still use it only occasionally—and often treat it as just another procedural language. But that's not the best way to use XSLT. Learn how to simplify and improve your XSLT processing using event-driven and declarative techniques.

XML appears in some form in most modern applications—and often needs to be transformed from one form into another: merged, split, massaged, or simply reformatted into HTML. In most cases, it's far more robust and efficient to use XSLT to perform such transformations than to use common programming languages such as Java, VB.NET, or C#. But because XSLT is an add-on rather than a core language, most developers use XSLT only occasionally, and have neither time nor resources to dive into the peculiarities of XSLT development or to explore the paradigms of functional and flow-driven programming that efficient use of XSLT requires.

Such occasional use carries the danger of abusing programming techniques suitable for mainstream languages such as Java, C and Python, but that can lead to disastrous results when applied to XSLT.

However, you can avoid the problems of occasional use by studying a few applications of different well-known programming problems to an XSLT programming task through this set of simple, thoroughly explained exercises.

Buying a computer mouse in Minsk

This was written in 2002 and is not true anymore. The computer market is much more civilized now, after the state legalized it for its own profit.

What can be simpler than that? Drop in a supermarket, choose a model out of half a dozen available and leave with your new mouse, probably not the cheapest one, but of an acceptable price and decent quality.

Now, this is the most probable scenario in Warsaw, Teheran or Manila, but you have to follow a different route in Minsk.

First, you look up on the Internet the prices of local computer firms. There are several sites that list prices from most of the firms but the local star is Prices are listed in one line per item, which is composed of the name of the company, a short description of the item, its shop price, gross price and dealer’s price. While very concise, the list can grow quite big and it actually had ~300 entries for mice and “ other manipulators”, as they say in the computer slang.

Few companies ship to the doors. If you are lucky, your story ends here. However, most of them sell in-place only. Once you have decided to visit them, provide yourself with a map of the town and some patience, as even the big resellers do not show up from the outside of the street. Their offices are always located deep inside the buildings of half-closed research institutes or state companies, among the multitude of small law firms, publishing and printing companies that rent their offices in the same place.

The firm of our choice was located in the building belonging to a transport company. Having wandered a few minutes in the hallways, we found our firm on the 2nd floor, near the lady’s room.


Сэмюел Бекетт. В Ожидании Годо

Translation Alexander Mikhailian
Spellcheck: Tatsiana Klimantovich
Date: 20010910

Примечание переводчика:

Во  время  моей  работы с французской труппой, которая представляла эту
пьесу, выяснилось, что единственный вариант перевода, некогда опубликованный
в журнале "Иностранная Литература", не подходил для подстрочного/синхронного
перевода, так как в нем в значительной мере был  утерян  ритм  оригинального
текста.  В новом переводе особое внимание уделено синхронизации длительности

Некоторые методы автоматического анализа естественного языка, используемые в промышленных продуктах

Пример текста, в котором каждому слову поставлена в соответствие часть речи.

Description of state-of-the-art Natural Language Processing Technologies. Topics concerned include POS tagging, Text Parsing, Automatic Text Summarization. A lot of information on successful linguistic enterprises and research groups is also provided.


Исследования и разработки в области автоматической обработки текста в Европе и США привлекают внимание крупнейших частных фирм и государственных организаций самого высокого уровня. Европейский союз уже несколько лет координирует различные программы в области автоматической обработки текста. Например, Human Language Technology Sector of the Information Society Technologies (IST) Programme 1998 - 2000. Один из наиболее интересных проектов в рамках данной программы SPARKLE (Shallow PARsing and Knowledge Extraction for Language Engeneering). В числе его участников - Dimler-Benz, Xerox Research Centre in Europe и Cambridge University Computer Laboratory. Цель проекта боздание частичных синтаксических анализаторов для основных языков Европейского союза.

В США с 1991 до осени 1998 года существовал проект TIPSTER, организованный DARPA, Департаментом Обороны и ЦРУ совместно с Национальным Институтом Стандартов и Технологий и Центром военно-воздушных и военно-морских вооружений (SPAWAR). В работе консультативного совета программы участвовали также ФБР, Национальный Научный Фонд и некоторые другие организации. Основной целью программы было сравнение и оценка результатов работы различных поисковых систем и систем реферирования.

Debian Etch on a Dell 6400 (E1505)


PCI devices

# lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/940GML Express Integrated Graphics Controller (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)