Machine Translation is as old as the field of Computational Linguistics itself. It is also a problem that has been predicted to „be solved in the next five years“ many times, but in fact it is not yet solved today.
Machine Translation is as old as the field of Computational Linguistics itself. It is also a problem that has been predicted to „be solved in the next five years“ many times, but in fact it is not yet solved today. Machine translation has been naively considered a simple problem solvable by simple statistical means, then studied in depth by complicated but unsuccessful detailed sets of rules trying to describe all details of natural language use, only to return to statistical approach on a completely different level, using a combination of linguistic analysis and powerful machine learning algorithms.
In the lecture, the basic statistical approach based on information theory will be described followed by the description of today’s phrase-based statistical state-of-the-art systems, including new approaches using deep language analysis.
Prof. RNDr. Jan Hajič, Dr. is a professor at the Institute of Formal and Applied Linguistics at the Faculty of Mathematics and Physics, Charles University in Prague. Currently, he leads the LINDAT/CLARIN language resources infrastructural project. He is also active in building language resources, both for deep language analysis and for machine translation (annotated corpora, treebanks). He has published over 170 publications, often with international co-authors. In the 1990s, he was a member of the team building the first modern statistical machine translation system at IBM Research. He also taught at the Computer Science Department of the Johns Hopkins University in Baltimore. He was/is the Czech Co-PI of several EU projects on machine translation (recently, especially Euromatrix, META-NET, Companions, Khresmoi, Faust, QTLeap, HimL and QT21).
Its program consists of a one-hour lecture followed by a discussion. The lecture is based on an (internationally) exceptional or remarkable achievement of the lecturer, presented in a way which is comprehensible and interesting to a broad computer science community. The lectures are in English.
The seminar is organized by the organizational committee consisting of Roman Barták (Charles University, Faculty of Mathematics and Physics), Jaroslav Hlinka (Czech Academy of Sciences, Computer Science Institute), Michal Chytil, Pavel Kordík (CTU in Prague, Faculty of Information Technologies), Michal Koucký (Charles University, Faculty of Mathematics and Physics), Jan Kybic (CTU in Prague, Faculty of Electrical Engineering), Michal Pěchouček (CTU in Prague, Faculty of Electrical Engineering), Jiří Sgall (Charles University, Faculty of Mathematics and Physics), Vojtěch Svátek (University of Economics, Faculty of Informatics and Statistics), Michal Šorel (Czech Academy of Sciences, Institute of Information Theory and Automation), Tomáš Werner (CTU in Prague, Faculty of Electrical Engineering), and Filip Železný (CTU in Prague, Faculty of Electrical Engineering)
The idea to organize this seminar emerged in discussions of the representatives of several research institutes on how to avoid the undesired fragmentation of the Czech computer science community.