2014

2015

2016

2017

2018

2019

2020

2022

2023

2024

2025

Šedesáté setkání Pražského informatického semináře

Ondřej Bojar

AI Stolen by Transformers!

Sequence-to-sequence deep learning models originating in the area of machine translation (MT) have exploded the public interest in AI, effectively stealing the name of the field for the current years. To MT researchers, this is reminiscent of 2013-2015 when deep learning invaded the field of MT and completely rewrote the methodology.

Anotace Přednášející Poster

2025
2024
2023
2022
2020
2019
2018
2017
2016
2015
2014

Přehrát záznam

23. května 2024

16:15

Posluchárna E-107, FEL ČVUT
Karlovo nám. 13, Praha 2
Zobrazit na mapě

In my talk, I will mention some of our achievements in translation thanks to Transformers and demonstrate our heavily multilingual speech-to-text translation, but I will primarily illustrate and warn about common misconceptions and evaluation fallacies we know well from the MT domain, e.g. attributing observed gains to wrong reasons.

Unfortunately, the current AI hype is fuelled to some extent by such mismeasurements. If we do not bring more technically sound and realistic assessment of large language models' abilities into the discussion, we are risking another AI winter, i.e. a sudden decline in interest and both private and public funding into AI development.

Ondřej Bojar

Ondřej Bojar is an Associate Professor at ÚFAL (Institute of Formal and Applied Linguistics), Charles University, and a lead scientist in Machine Translation in the Czech Republic. He has been co-organizing a well-known series of shared tasks in machine translation and machine translation evaluation (WMT) since 2013. His system has dominated English-Czech translation in the years 2013-2015, before deep learning and neural networks fundamentally changed the field. Having taken part and later supervised ÚFAL’s participation in a series of EU projects (EuroMatrix, EuroMatrixPlus, MosesCore, QT21, HimL, CRACKER, Bergamot), he has recently concluded his coordination of the EU project ELITR focused on simultaneous speech translation into over 40 languages. ELITR has also coined the task of project meeting summarization with its AutoMin 2021 and 2023 shared task.

Jeho program je tvořen hodinovou přednáškou, po níž následuje časově neomezená diskuse. Základem přednášky je něco (v mezinárodním měřítku) mimořádného nebo aspoň pozoruhodného, na co přednášející přišel a co vysvětlí způsobem srozumitelným a zajímavým i pro širší informatickou obec. Přednášky jsou standardně v angličtině.

Idea Pražského informatického semináře vznikla z rozhovorů představitelů několika vědeckých institucí na téma, jak odstranit zbytečnou fragmentaci informatické komunity v ČR.

Seminář připravuje organizační výbor ve složení Roman Barták (MFF UK), Jaroslav Hlinka (ÚI AV ČR), Michal Chytil, Pavel Kordík (FIT ČVUT), Michal Koucký (MFF UK), Jan Kybic (FEL ČVUT), Michal Pěchouček (FEL ČVUT), Jiří Sgall (MFF UK), Vojtěch Svátek (FIS VŠE), Michal Šorel (ÚTIA AV ČR), Tomáš Werner (FEL ČVUT), Filip Železný (FEL ČVUT)