in reply to Re^4: HTML::Parser, file, print to Terminal
in thread HTML::Parser, file, print to Terminal

The snippet you show is encoded in UTF-8.

Next step: determine the encoding of the file in which umlauts display correctly on your terminal.

Or even better: configure a clean UTF-8 enivronment.

I suppose the confusion lies in => if I create the file, I get my Latin-1. If I didn't create the file, there is only ASCII.

I'm confused indeed. If you don't create a file, it doesn't exist, neither with ASCII nor with UTF-8.

Speaking of confusion, I think you try to achieve too much in one step. For example the title of your question metions HTML::Parser, which doesn't appear in the posting at all.

So, small steps:

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^6: HTML::Parser, file, print to Terminal
by victor_charlie (Novice) on Jul 13, 2010 at 19:37 UTC

    Okay, this DOES work...

    #!/usr/bin/perl -w # legaget.pl use strict; use Encode; my $filename = "engleword.html"; open FILE, "<", $filename or die $1; while( my $line = <FILE> ) { print encode( "utf8",$line); } close(FILE);

    What I have learned...

    • use utf8; is for Unicode source code, filenames, deals with legacy stuff, not for encoding.
    • I still have to grab the html and write to a file, I would still like to encode the string in place. Maybe later.

    I have come across this encode problem as a graphic artist. Customers used MSWord to generate text and then pasted the resulting text into html, or Adobe Pagemaker, PDF, etc. and everything is just hunky-dory on a WinBox, but on a Mac or Linux the results had missing characters. MS was late adopting Unicode. MS thought they had another answer with OpenType (I think it was) a fonts technology in partnership with Adobe. That fell apart. But in pre-XP MS text products the first byte set the encode for the text file. I used to have a FreeWare program on the PC that manually changed that byte.

    Forgive me, I worked on this silly problem all day, but I'm loving Perl.