Re: Text File Encoding under Windows

I need to see specific examples to provide a complete answer. There are some things I would suggest. Don't trust text editors or the display when you view and print characters. If something strange is going on, then print out ordinal values ord($char). This will give you numeric values that you can trust. And it will show you any character that's not visible

A character in the 32-126 range is normal. If it's less than 32, and it's not \n, then change it to ' '. $text =~ s/\s+/ /g; If it's above 126, then it's an 8-bit quantity that will mess up the regex's, and probably Windows. What you do with these values depends on the assigment. This will delete them:

     

     my $low  = chr(127);
     my $high = chr(255); 
     $text =~ s/[$low-$high]//g;
[download]

Some of the 8-bit values represent standard punctuation, and you can change them into 7-bit quantities. If there are two three or four consecutive 8-bit characters, then you have to deal with 16-bit, 24-bit, 32-bit UTFs. There's a definition on Wikipadia. There might a package on CPAN. There's also a huge translation table online. http://www.utf8-chartable.de/unicode-utf8-table.pl

Hope this helps. Sean

Comment on Re: Text File Encoding under Windows Download Code