in reply to dynamically detect code page
Is the file really written using different encodings for different languages? I.E., it's not all UTF-8? If it's really got multiple encodings, I think the best you'll be able to do is make a best-guess. Since it looks like your lines are pretty standard, you can probably do pretty well. See perldoc Encode. I would try treating the data as each possible language, and checking for unlikely characters or combinations of characters in the result.