Application of Factored Models in English-Latvian Statistical Machine Translation System (2009-2012)
Aim of the project.
The project aims to research main factors that can improve performance of English-Latvian phrase-based statistical machine translation (SMT) system (developed in project Evaluation of statistical Machine Translation methods for English-Latvian translation system (2005-2008) )
, and integrate different features (factors) into the baseline system to improve translation quality and widen domain of application.
Several theoretical and practical results are planned in the project. The main theoretical results will be: analysis of translation quality of the baseline system, recommendations for development of English-Latvian factored SMT, and evaluation of factored models. The main practical result will be a prototype of factored English-Latvian statistical MT system.
Tools and resources
We use GIZA++ for alignment, SRILM for language models, Ailab tagger for Latvian text annotion and Modes decoder for baseline and factored SMT. We use JRC Acquis parallel corpus (version 3.0), DGT-TM corpus(2007), EMEA corpus and some small additional corpora for training.
Dr. Inguna Skadiņa
Example in-domain sentences for translation:
Tools and resources developed within the project:
- Amendments to the agreement shall be adopted by consensus of all contracting parties.
- Commercial policy measures on exports shall apply at the time of acceptance of the declaration of entry for the procedure.
- Vessels shall be prohibited from using any beam trawl of which the mesh size lies between 32 and 99 millimetres.
- The council of the European Union has adopted this decision.
- The steering committee shall be composed of one representative appointed by each member state.
- This appropriation is intended to cover vehicle maintenance and operating costs and costs relating to the use of public transport.
- Competent authorities receiving confidential information under article 44 may use it only in the course of their duties.
- Skadiņa I., K. Levāne-Petrova, G.Rābante. 2012. Linguistically Motivated Evaluation of English-Latvian Statistical Machine Translation. // Proceedings of the Fifth International Conference Baltic HLT 2012, IOS Press, Frontiers in Artificial Intelligence and Applications, Vol. 247, pp. 221-229.
- Skadiņa, I., Virza, M., Pretkalniņa, L. 2012. Angļu-latviešu statistiskās mašīntulkošanas sistēmas izveide: metodes, resursi, un pirmie rezultāti // Baltistica VIII priedas, Vilnius, pp. 155-168.
- Khalilov M., Fonollosa J., Skadiņa I., Brālītis E., Pretkalniņa L. 2010. Towards Improving English-Latvian Translation: A System Comparison and a New Rescoring Feature //Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), 2010, May 19-21, Valletta, Malta, pp. 1719-1725.
- Skadiņa I., Brālītis E. 2009. English-Latvian SMT: knowledge or data? // Proceedings of the 17th Nordic Conference on Computational Linguistics NODALIDA, May 14-16, 2009, Odense, Denmark, NEALT Proceedings Series, Vol. 4 (2009), pp. 242–245.
- Skadiņa I., Brālītis E. 2008. Experimental Statistical Machine Translation System for Latvian. // Proceedings of the 3rd Baltic Conference on HLT, Vilnius, pp. 281-286.
- Skadiņa I. 2005. Studies of English-Latvian Legal texts for Machine Translation. // Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora, Continuum, pp. 188-195.
- Skadiņa I. 2004. Machine Translation for Latvian. //Proceedings of First Baltic Conference „Human Language Technologies – the Baltic Perspective”, Riga, pp. 102-106.
16, 2009, Odense, Denmark, NEALT Proceedings Series, Vol. 4 (2009), 242–245.
The project is funded by Latvian Council of Science
Matemātikas un informātikas institūta
Mākslīgā intelekta laboratorija