Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm a real newbie and would appreciate any help. I'm trying to modify a script that parses the HTML tags from a page and emails selected content using sendmail.

My problem is that the content is Spanish and the special characters (ñ, etc.) are not coming through in the email, MS Entourage at least, but are showing up as ~ or / or ^. I'm trying to substitute the characters for the HTML encodings and have the ISO set to 8859-1, but I must be doing something wrong. Any ideas?

The code setting up the email looks like:
open(MAIL,"|$G{MailProgram}"); print MAIL "To: $In{toemail}\n"; print MAIL "From: \"$In{name}\" \<$In{fromemail}\>\n"; print MAIL "X-IP_Address: $ENV{REMOTE_ADDR}\n"; print MAIL "MIME-version: 1.0\n"; print MAIL "Content-Type: text/plain; charset=ISO-8859-1\n"; print MAIL "Subject: Articulo\n\n"; print MAIL "$In{message}\n\n"; print MAIL "$In{name} le manda este articulo\n\n"; $printline = 0; for(@L) { if($_ =~ /BEGIN mailer/) { $printline = 1; next; } next unless $printline; if($_ =~ /END mailer/) { $printline = 0; next; } $_ =~ s/<.*?>//g; $_ =~ s/&amp;amp;ntilde;/ñ/g; ...and so on with the other characters... print MAIL $_; } close MAIL;

Replies are listed 'Best First'.
Re: Spanish special characters
by ChemBoy (Priest) on Jul 26, 2001 at 04:42 UTC

    I'm just guessing here, but might it be that your text editor and your Perl interpreter disagree on character sets? I've never seen a literal high-bit character (such as ñ) in source code before, which may be coincidence or may not. For a test, why not try replacing the right hand side of each substitution with the escape sequence for the relevant character? That is, change your last substitution to

    s/&ntilde;/\xf1/;
    and so on.

    Better yet, though, don't write your own code to do it, use HTML::Entities to decode each string, thus:

    #!/usr/bin/perl -w use strict; # you were using strict, weren't you? ;-) use HTML::Entities 'decode'; # snip (your MAIL statements here) for (@L) { print MAIL decode $_; }
    Isn't that simpler? :-)

    While I'm in "use CPAN" mode, I suggest you look at HTML::Parser for the tag-removing regex--parsing HTML (even just taking out tags) with regular expressions is a dangerous idea.

    Good luck!



    If God had meant us to fly, he would *never* have give us the railroads.
        --Michael Flanders

Re (tilly) 1: Spanish special characters
by tilly (Archbishop) on Jul 26, 2001 at 08:05 UTC