This repository contains the code and results from the paper Statistical Comparison of LLM and NMT Translations for Japanese-English.
All API keys have been removed from these files, but they are otherwise as used in the creation of the paper. If you intend to run these files, you will need to adjust the "DIRECTORY" and "PRIMARY_DIRECTORY" variables depending on the location of files on your computer. Additionally, these scripts were created on Windows 11 and are not garunteed to run properly on Linux or MacOS.
priorDatasets.zip contains all the processed data from this study. Each sub-corpus is labeled "NCIT-#", with the "#" representing the number of sentences in the corpus. The original NICT Bilingual Corpus is not provided in this upload. Specific info about how each sub-corpus was created is included in an info.md document. Output translations from each model are in CSV files, while results are in both CSV and JSON formats.