This release has an English model (wiki-truecaser-model-en.tar.gz
), a German model (wmt-truecaser-model-de.tar.gz
), and a Russian model (lrl-truecaser-model.ru.tar.gz
), and Spanish model (wmt-truecaser-model-es.tar.gz
).
The English model is trained on 2.9M tokens of Wikipedia data, and gets 93.01 on Wikipedia test data.
The German model is trained on 2.6M tokens of monolingual text from WMT, and gets 97.86 on a test split.
The Russian model is trained on 2M tokens of monolingual text from LORELEI and gets 87.61 on a test split.
The Spanish model is trained on 18M tokens of monolingual text from WMT, and gets 93.01 on a test split.