latin1-tokenizer < latin1.txt > latin1.tkns