in reply to Peeling Data with Reserved Characters and Long Lines

That was it!

Turns out they're UTF-16 coded. Hadn't thought of that. I saved a test file in Roman and one in Latin—the scripts worked on both. I don't yet know if the specific data that has to be matched loses info if I convert to Roman/Latin but at least I'm on a better path.

Thanks.

  • Comment on Re: Peeling Data with Reserved Characters and Long Lines

Replies are listed 'Best First'.
Re^2: Peeling Data with Reserved Characters and Long Lines
by Eliya (Vicar) on Mar 13, 2011 at 01:48 UTC
    I don't yet know if the specific data that has to be matched loses info if I convert to Roman/Latin

    You can tell Perl the file is encoded in UTF-16, so it will decode it properly.  This way you won't lose anything.  E.g.

    my $infile = shift @ARGV; open my $fh, "<:encoding(UTF-16)", $infile or die $!; while (<$fh>) { ...

    (In case the file has no BOM, you might need to use encoding(UTF-16LE) instead of encoding(UTF-16).)