Re: Peeling Data with Reserved Characters and Long Lines

That was it!

Turns out they're UTF-16 coded. Hadn't thought of that. I saved a test file in Roman and one in Latin—the scripts worked on both. I don't yet know if the specific data that has to be matched loses info if I convert to Roman/Latin but at least I'm on a better path.

Thanks.

Comment on Re: Peeling Data with Reserved Characters and Long Lines

Replies are listed 'Best First'.
Re^2: Peeling Data with Reserved Characters and Long Lines by Eliya (Vicar) on Mar 13, 2011 at 01:48 UTC
I don't yet know if the specific data that has to be matched loses info if I convert to Roman/Latin You can tell Perl the file is encoded in UTF-16, so it will decode it properly. This way you won't lose anything. E.g. `my $infile = shift @ARGV; open my $fh, "<:encoding(UTF-16)", $infile or die $!; while (<$fh>) { ...` [download] (In case the file has no BOM, you might need to use `encoding(UTF-16LE)` instead of `encoding(UTF-16)`.)	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Peeling Data with Reserved Characters and Long Lines
by Eliya (Vicar) on Mar 13, 2011 at 01:48 UTC

I don't yet know if the specific data that has to be matched loses info if I convert to Roman/Latin

You can tell Perl the file is encoded in UTF-16, so it will decode it properly. This way you won't lose anything. E.g.

my $infile = shift @ARGV;
open my $fh, "<:encoding(UTF-16)", $infile or die $!;

while (<$fh>) {
   ...
[download]

(In case the file has no BOM, you might need to use encoding(UTF-16LE) instead of encoding(UTF-16).)

[reply]
[d/l]
[select]