Application of Factored Models in English-Latvian Statistical Machine Translation System (2009-2012)

Aim of the project. The project aims to research main factors that can improve performance of English-Latvian phrase-based statistical machine translation (SMT) system (developed in project Evaluation of statistical Machine Translation methods for English-Latvian translation system (2005-2008) ), and integrate different features (factors) into the baseline system to improve translation quality and widen domain of application.

Expected results. Several theoretical and practical results are planned in the project. The main theoretical results will be: analysis of translation quality of the baseline system, recommendations for development of English-Latvian factored SMT, and evaluation of factored models. The main practical result will be a prototype of factored English-Latvian statistical MT system.

Tools and resources We use GIZA++ for alignment, SRILM for language models, Ailab tagger for Latvian text annotion and Modes decoder for baseline and factored SMT. We use JRC Acquis parallel corpus (version 3.0), DGT-TM corpus(2007), EMEA corpus and some small additional corpora for training.

Project coordinator: Dr. Inguna Skadiņa

English text (400 symbols max):
Latvian text:
Use morphological factors!

Example in-domain sentences for translation:
Tools and resources developed within the project: Related publications:
  1. Skadiņa I., K. Levāne-Petrova, G.Rābante. 2012. Linguistically Motivated Evaluation of English-Latvian Statistical Machine Translation. // Proceedings of the Fifth International Conference Baltic HLT 2012, IOS Press, Frontiers in Artificial Intelligence and Applications, Vol. 247, pp. 221-229.
  2. Skadiņa, I., Virza, M., Pretkalniņa, L. 2012. Angļu-latviešu statistiskās mašīntulkošanas sistēmas izveide: metodes, resursi, un pirmie rezultāti // Baltistica VIII priedas, Vilnius, pp. 155-168.
  3. Khalilov M., Fonollosa J., Skadiņa I., Brālītis E., Pretkalniņa L. 2010. Towards Improving English-Latvian Translation: A System Comparison and a New Rescoring Feature //Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), 2010, May 19-21, Valletta, Malta, pp. 1719-1725.
  4. Skadiņa I., Brālītis E. 2009. English-Latvian SMT: knowledge or data? // Proceedings of the 17th Nordic Conference on Computational Linguistics NODALIDA, May 14-16, 2009, Odense, Denmark, NEALT Proceedings Series, Vol. 4 (2009), pp. 242–245.
  5. Skadiņa I., Brālītis E. 2008. Experimental Statistical Machine Translation System for Latvian. // Proceedings of the 3rd Baltic Conference on HLT, Vilnius, pp. 281-286.
  6. Skadiņa I. 2005. Studies of English-Latvian Legal texts for Machine Translation. // Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora, Continuum, pp. 188-195.
  7. Skadiņa I. 2004. Machine Translation for Latvian. //Proceedings of First Baltic Conference „Human Language Technologies – the Baltic Perspective”, Riga, pp. 102-106.
  8. 16, 2009, Odense, Denmark, NEALT Proceedings Series, Vol. 4 (2009), 242–245.

The project is funded by Latvian Council of Science.

Latvijas Universitātes
Matemātikas un informātikas institūta
Mākslīgā intelekta laboratorija