victor_charlie has asked for the wisdom of the Perl Monks concerning the following question:

I can grab web content with:

use strict; use warnings; use LWP::Simple; my $url = 'http://www.webpage.htm'; my $file = 'myWebpage.html'; my $status = getstore($url, $file); die "Error $status on $url" unless is_success($status); open(IN, "<$file") || die "Can't open $file: $!"; close(IN);

And the iso-8859-1 characters are there if I open that file in a word processor, eg 'ü'. 'ö', 'ß'... are all there. But reading that file back into Perl would give me missing characters when I print to the Terminal. eg �

If I use ...

use open ":encoding(UTF-8)";

in my code to open that disc file to work on, the print to Terminal gives me the hex. eg 'H\xF6lzl'

How do I get the umlaut over the Latin-1 characters to print to Terminal?

Replies are listed 'Best First'.
Re: HTML::Parser, file, print to Terminal
by moritz (Cardinal) on Jul 13, 2010 at 13:13 UTC
    What encoding does your terminal accept?

    Also try to only binmode STDOUT, ":encoding(yourencoding)" (instead of the open pragma) in order not to interfere with opening the file.

      If I compose a short line of text in my gedit (Linux equivalent for Notepad).

      Nüne istá baßt alongnöw.

      and I use the simple code:

      #!/usr/bin/perl -w # legaget.pl use strict; my $filename = "ligatext.txt"; open FILE, "<", $filename or die $1; while( my $line = <FILE> ) { print $line; } close(FILE);

      My terminal does in fact show those ligatures. However, to grab a webpage.html off the web, that I might add is not W3.org compliant; they don't use the meta line encoding='iso-xxxx-x'> The browser shows the ligatures, the file saved of that webpage will show the ligatures in gedit word processor, but... the same code above will throw in the < ? > symbol with print to Terminal.

      I might add I have fought this same thing with MSWord files, as MS puts the Unicode country code in the first byte of their Word.doc format as a hex.

      Yes, I've read binmode <STDOUT> description, they don't show an example. Can you give me a short snippet, let me try it???? Maybe, Open a file, read a line at a time to print to Terminal?

Re: HTML::Parser, file, print to Terminal
by Anonymous Monk on Jul 13, 2010 at 13:12 UTC
    get/configure a terminal for utf or some encoding, and print using that encoding