Re^4: dynamically detect code page

you're pretty much out of luck. you really need to get more information than you have available in just the log file.

from the log provided, it looks like the multiple-charset part is a filename (possibly containing a virus or some such). maybe the filename is really short, one or two characters. there's no way to tell which codepage is the correct one. for instance, the single byte 0xE5 can be any of the following in just the first few ISO-8859 encodings...

å
ĺ
ċ
х
م
ε

there is no way to correctly convert this one (or two or three) byte filename/whatever into UTF-8 unless you know the correct codepage beforehand. it just isn't going to happen.

you'll have to arrange to recieve a list of the machine names and their respective codepages beforehand. but once you have that, it is pretty easy to convert everything to UTF-8 and do any sort of regex manipulation.

Comment on Re^4: dynamically detect code page