Advanced
language technology
Machine
Translation
We
have
developed a high-performance machine translation system for translating
Swahili text to English. Here we describe some characteristics of the
system.
Rule-based
approach
We
believe in
careful and detailed analysis of language as a basis for professional
language technology. Language is a complex system of communication with
a large number of rules that guide its use. Why don't we make full use
of these rules? While basing the technology on linguistic rules we
resort to guessing only when no rules can be formulated to
control processing.
Modular
structure
We
compose linguistic applications using carefully designed and thoroughly
tested modules. This makes it possible to compose high-level
applications cost-effectively.
High
speed
We
use finite
state methods in morphological analysis. These methods have proved to
have an extremely fast processing speed. They are also optimal in
describing the morphology of highly inflecting languages.
In
disambiguation and syntactic mapping we use Constraint Grammar rules.
Such rules allow a complex set of contextual tests for
controlling rule application.
Currently the translation speed is about 45,000 words per minute with
an average home computer.
Mastering
multi-word expressions
Multi-word
expressions (MWE)
have proved to be very difficult to handle in such applications as
Machine Translation. We have developed a system for describing various
types of MWEs, as well as methods for controlling their
behavior
in various contexts.
Controlling
word order
In
MT between
syntactically very different languages, such as Swahili and English,
the control of word order is a major problem. In a rule-based MT system
it is possible to define a small set of rules that convert various
syntactic constituents (e.g. noun phrases and verb structures) of the
source language into the structure required by the target language.
Such rules make use of part-of-speech tags and other
linguistic tags. Word order control includes also the
presence
and absence of pronouns in target language.
Correct
word-forms of target language
The
word-forms
of target language are produced with the help of the noninflected
lexical word of the target language and the grammatical tags inherited
from the source language. Also here optimization methods have been used
for minimizing the size of the system and for maximizing the processing
speed.