A old but still useable phpbb2drupal conversion script

This one appears high in Google SERPs on phpbb2drupal.

{syntaxhighlighter brush:sql}
--- Written by Feodor (feodor [at] mundo.ru)
--- Modified for drupal 4.5.2 and phpbb 2.0.11-2.0.13
--- by Alexander Mikhailian
--- This script makes an assumption that phpbb and drupal tables are kept in
--- one and the same database. Phpbb tables are expected to have the prefix
--- phpbb_ and drupal tables are expected to have the prefix drupal_
--- If phpBB forum use CP1251 (or another) encoding, the tables must be converted
--- into UTF8. If version of MySQL less then 4.1 "iconv" command can be used for
--- convertion of exported tables into UTF8.
--- Example:
--- iconv -fcp1251 -tutf8 phpbb2_utf-8.sql
--- Here is a list of phpbb tables used by script for import into Drupal:
--- phpbb_categories
--- phpbb_forums
--- phpbb_posts
--- phpbb_posts_text
--- phpbb_users
--- phpbb_vote_desc
--- phpbb_vote_results
--- You should probably edit the two variables below to match your result:

--- The name of the forums taxanomy as it appears on the site
SELECT @forum_title:='Forums';

--- Start importing users from this id. Do not forget that uid 1 is always
--- the Administrator in Drupal. Depending on whether you already created
--- the Administrator user in Drupal or not, you may want to change this
--- variable into 2.
SELECT @first_phpbb_user_id:=1; # uid 1 is always an administrator in Drupal


A case for XTM 3.0

This was written for TMRA'08 together with Rani Pinchuk and Xuân Baldauf.

Improvements to XTM 2.0 are suggested in this paper. First, a set of criteria is defined for evaluating those improvements. It is followed by the suggestions
themselves: align element names with the names used in TMDM, reduce the number of elements by introducing mixed content and using attributes whenever it is possible. Finally, some relevant irregularities are discussed.

See the paper in the attachment.


Learning foci for Question Answering over Topic Maps

The paper on Question Answering I have cowritten with Rani Pinchuk and Tiphaine Dalmas for ACL-IJCNLP'09.


This paper introduces the concepts of asking point and
expected answer type as variations of the question focus.
They are of particular importance for QA over semi-structured data, as
represented by Topic Maps, OWL or custom XML formats. We describe an approach
to the identification of the question focus from questions asked to a
Question Answering system over Topic Maps by extracting the asking
and falling back to the expected answer type when necessary.
We use known machine learning techniques for expected answer type
extraction and we implement a novel approach to the asking point
extraction. We also provide a mathematical model to predict the performance
of the system.

See the paper in the attachment.


Occasional XSLT for Experienced Software Developers

FIrst published in 2004 in DevX

Although using XSLT to process XML is increasingly common, most developers still use it only occasionally—and often treat it as just another procedural language. But that's not the best way to use XSLT. Learn how to simplify and improve your XSLT processing using event-driven and declarative techniques.

XML appears in some form in most modern applications—and often needs to be transformed from one form into another: merged, split, massaged, or simply reformatted into HTML. In most cases, it's far more robust and efficient to use XSLT to perform such transformations than to use common programming languages such as Java, VB.NET, or C#. But because XSLT is an add-on rather than a core language, most developers use XSLT only occasionally, and have neither time nor resources to dive into the peculiarities of XSLT development or to explore the paradigms of functional and flow-driven programming that efficient use of XSLT requires.

Such occasional use carries the danger of abusing programming techniques suitable for mainstream languages such as Java, C and Python, but that can lead to disastrous results when applied to XSLT.

However, you can avoid the problems of occasional use by studying a few applications of different well-known programming problems to an XSLT programming task through this set of simple, thoroughly explained exercises.

Buying a computer mouse in Minsk

This was written in 2002 and is not true anymore. The computer market is much more civilized now, after the state legalized it for its own profit.

What can be simpler than that? Drop in a supermarket, choose a model out of
half a dozen available and leave with your new mouse, probably not the cheapest
one, but of an acceptable price and decent quality.

Now, this is the most probable scenario in Warsaw, Teheran or Manila, but you
have to follow a different route in Minsk.

First, you look up on the Internet the prices of local computer firms. There
are several sites that list prices from most of the firms but the local star is
http://www.kosht.com. Prices are listed in one line per item, which is composed
of the name of the company, a short description of the item, its shop price,
gross price and dealer’s price. While very concise, the list can grow quite big
and it actually had ~300 entries for mice and “ other manipulators”, as they
say in the computer slang.

Few companies ship to the doors. If you are lucky, your story ends here.
However, most of them sell in-place only. Once you have decided to visit them,
provide yourself with a map of the town and some patience, as even the big
resellers do not show up from the outside of the street. Their offices are
always located deep inside the buildings of half-closed research institutes or
state companies, among the multitude of small law firms, publishing and
printing companies that rent their offices in the same place.


Сэмюел Бекетт. В Ожидании Годо

Translation Alexander Mikhailian
Email: mikhailian@altern.org
Spellcheck: Tatsiana Klimantovich
Date: 20010910

Примечание переводчика:

Во  время  моей  работы с французской труппой, которая представляла эту
пьесу, выяснилось, что единственный вариант перевода, некогда опубликованный
в журнале "Иностранная Литература", не подходил для подстрочного/синхронного
перевода, так как в нем в значительной мере был  утерян  ритм  оригинального
текста.  В новом переводе особое внимание уделено синхронизации длительности

Некоторые методы автоматического анализа естественного языка, используемые в промышленных продуктах

Пример текста, в котором каждому слову поставлена в соответствие часть речи.

Description of state-of-the-art Natural Language Processing Technologies. Topics concerned include POS tagging, Text Parsing, Automatic Text Summarization. A lot of information on successful linguistic enterprises and research groups is also provided.


Исследования и разработки в области автоматической обработки текста в Европе и США привлекают внимание крупнейших частных фирм и государственных организаций самого высокого уровня. Европейский союз уже несколько лет координирует различные программы в области автоматической обработки текста. Например, Human Language Technology Sector of the Information Society Technologies (IST) Programme 1998 - 2000. Один из наиболее интересных проектов в рамках данной программы SPARKLE (Shallow PARsing and Knowledge Extraction for Language Engeneering). В числе его участников - Dimler-Benz, Xerox Research Centre in Europe и Cambridge University Computer Laboratory. Цель проекта боздание частичных синтаксических анализаторов для основных языков Европейского союза.

В США с 1991 до осени 1998 года существовал проект TIPSTER, организованный DARPA, Департаментом Обороны и ЦРУ совместно с Национальным Институтом Стандартов и Технологий и Центром военно-воздушных и военно-морских вооружений (SPAWAR). В работе консультативного совета программы участвовали также ФБР, Национальный Научный Фонд и некоторые другие организации. Основной целью программы было сравнение и оценка результатов работы различных поисковых систем и систем реферирования.

Необходимо отметить, что такие задачи как распознование и генерации речи, создание поисковых систем до настоящего времени решаются с минимальным участием лингвистов. Это обусловлено использованием при решении вышеупомянутых задач в основном статистических методов.

Debian Etch on a Dell 6400 (E1505)


PCI devices

# lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/940GML Express Integrated Graphics Controller (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)