WCC-JC: A Web-Crawled Corpus for Japanese-Chinese Neural Machine Translation
2022
Jinyi Zhang | Ye Tian | Jiannan Mao | Mei Han | Tadahiro Matsumoto
Currently, there are only a limited number of Japanese-Chinese bilingual corpora of a sufficient amount that can be used as training data for neural machine translation (NMT). In particular, there are few corpora that include spoken language such as daily conversation. In this research, we attempt to construct a Japanese-Chinese bilingual corpus of a certain scale by crawling the subtitle data of movies and TV series from the websites. We calculated the BLEU scores of the constructed WCC-JC (Web Crawled Corpus—Japanese and Chinese) and the other compared corpora. We also manually evaluated the translation results using the translation model trained on the WCC-JC to confirm the quality and effectiveness.
Mostrar más [+] Menos [-]Información bibliográfica
Este registro bibliográfico ha sido proporcionado por Directory of Open Access Journals