TransTechno Advanced language technology

HomeContact us | Spell-checker | Word lists | Info retrieval | Dictionary compiler | Translator Swahili2English
Machine Translation
We have developed a high-performance machine translation system for translating Swahili text to English. Here we describe some characteristics of the system.

Rule-based approach

We believe in careful and detailed analysis of language as a basis for professional language technology. Language is a complex system of communication with a large number of rules that guide its use. Why don't we make full use of these rules? While basing the technology on linguistic rules we resort to guessing only when no rules can be formulated to control processing. 

Modular structure

We compose linguistic applications using carefully designed and thoroughly tested modules. This makes it possible to compose high-level applications cost-effectively.

High speed

We use finite state methods in morphological analysis. These methods have proved to have an extremely fast processing speed. They are also optimal in describing the morphology of highly inflecting languages.

In disambiguation and syntactic mapping we use Constraint Grammar rules.  Such rules allow a complex set of contextual tests for controlling rule application.

Currently the translation speed is about 45,000 words per minute with an average home computer.

Mastering multi-word expressions

Multi-word expressions (MWE) have proved to be very difficult to handle in such applications as Machine Translation. We have developed a system for describing various types of MWEs, as well as methods for controlling their behavior in various contexts.

Controlling word order

In MT between syntactically very different languages, such as Swahili and English, the control of word order is a major problem. In a rule-based MT system it is possible to define a small set of rules that convert various syntactic constituents (e.g. noun phrases and verb structures) of the source language into the structure required by the target language. Such rules make use of part-of-speech tags and other linguistic tags.  Word order control includes also the presence and absence of pronouns in target language. 

Correct word-forms of target language

The word-forms of target language are produced with the help of the noninflected lexical word of the target language and the grammatical tags inherited from the source language. Also here optimization methods have been used for minimizing the size of the system and for maximizing the processing speed.