SiGiN has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to put up my own (a little bit customized) webmail script. I've figured out how to convert HTML to Text

Something like this:

use HTML::Scrubber; my $scrubber = HTML::Scrubber->new; my $result=$scrubber->scrub_file('index.htm');

But I just can't find any possible way to convert text to HTML (like most webmails do - replacing links with A tag, escaping unsafe characters and so on)

I am not lazy to write my own code for this, but I am afraid that this have been done many times already, as well as I am afraid of making little and obvious mistakes, leaving script vulnerable to obvious XSS)

Thanks in advance. Sincerely Timo

Replies are listed 'Best First'.
Re: Text to HTML convert.
by jeffa (Bishop) on Oct 07, 2004 at 14:56 UTC

      HTML::FromText totally wrecks cyrillic letters =/. Seems, like I'll do it rather myself

      Will keep You informed

      Update:

      I managed to make it this way:

      use HTML::FromText; use HTML::Entities; my $text=shift; my $t2h = HTML::FromText->new({ paras => 0, blockcode => 0, tables => 0, bullets => 0, numbers => 0, urls => 1, email => 1, bold => 0, underline => 0, metachars => 0, }); $text=encode_entities($text, '<>&"'); my $html=$t2h->parse($text);

        Something like this is probably all you need

        sub get_text { my ($text, $cols) = @_; $cols ||= 80; require Text::Wrap; $Text::Wrap::columns = $cols; # tabs to 4 spaces $text =~ s/\t/ /g; # now wrap it $text = Text::Wrap::wrap('','',$text); # iterate over chunks, escaping HTML and linking $text =~ s{(\S+)} { local $_ = $1; m/^(?:http|ftp)/ ? qq!<a href="$_">!.escapeHTML($_).qq +!</a>! : m/^www\./ ? qq!<a href="http://$_">!.escapeHTML +($_).qq!</a>! : m/^[^@]+@[^@]/ ? qq!<a href="mailto:$_">!.escapeHTML +($_).qq!</a>! : escapeHTML($_); }ge; # fix whitespace (and newlines if not using pre tags) $text =~ s/( {2,})/"&nbsp;" x length $1/eg; #$text =~ s/\n/<br>\n/g; # wrap in pre tags $text = '<pre>' . $text . '</pre>'; return $text; } sub escapeHTML { my ( $escape ) = @_; return '' unless defined $escape ; $escape =~ s/&/&amp;/g; $escape =~ s/"/&quot;/g; $escape =~ s/</&lt;/g; $escape =~ s/>/&gt;/g; $escape =~ s/([^\000-\177])/'&#' . (sprintf "%3d", ord $1) . ';'/e +g; return $escape; }

        cheers

        tachyon

Re: Text to HTML convert.
by Grygonos (Chaplain) on Oct 07, 2004 at 14:30 UTC

    I don't have any pre-dev'd solution for you.. however may I reccomend the following (just off the top of my head)

    use strict; use warnings; #Just listing a few bad chars (I think hehe) for example my %escap_chars = ( '>' => '&gt;', '<' => '&lt;}; #Text for email would either be in array or scalar I'll presume my $email_text = 'Hi this is a math email 2 + 2 = 4. and 4 > 5'; #if scalar foreach my $key (keys(%escape_chars)) { $email_text =~ s/$key/$escape_chars{$key}/g; } # if each line of email is in an array foreach my $key (keys(%escape_chars)) { s/$key/$escape_chars{$key}/g for(@email_text); }
    Just as a method for dropping characters that need escaping and changing to their HTML counterpart. I believe (don't know for sure obviously) that this is something like what you would need to do. I believe there is a module that has a http link regex that you can use. Also just as a feature thought. you could test the links you find to make sure they are valid (read: not broken)links..

      You also need to escape the & (ampersand) character, as this is also a reserved char in HTML.

      I dug this code out of one of my old scripts. I'm not saying it's perfect, by any means. In particular it can treat things as URLs when they are not well formed.

      while(<F>) { s/&/&amp;/g; s/</&lt;/g; s/>/&gt;/g; s!(https?://[-~@=_%;&/\+\.\?a-zA-Z0-9]+)!<a href="$1">$1</a>!g; s!([-_\+\.a-zA-Z0-9]+@[-_\+\.a-zA-Z0-9]+)!<a href="mailto:$1">$1<\ +/a>!g; # then print $_, or whatever you want to do with it }

      But I hope it helps you get started. Other monks are (as always) welcome to suggest improvements or point out flaws!

      Update: The above assumes the whole output is wrapped in  <pre> tags, otherwise original paragraph/line formatting won't be preserved.

      Update 2: Actually, the CPAN module HTML::FromText as suggested by jeffa looks like a better idea. I'll have to look into it myself...


      s^^unp(;75N=&9I<V@`ack(u,^;s|\(.+\`|"$`$'\"$&\"\)"|ee;/m.+h/&&print$&
Re: Text to HTML convert.
by Intrepid (Curate) on Oct 07, 2004 at 19:57 UTC
Re: Text to HTML convert.
by swaroop.m (Monk) on Oct 08, 2004 at 09:42 UTC
    Hi Use HTML::Parser module.
    ##Use the module HTML::Parser use HTML::Parser; my $returnText ; my $objParser; ##Take the function parameters into local variables my( $sourceFilePath) = "1.html"; ##Mention that the parsed file should be converted to text file $objParser = HTML::Parser->new(text_h => [ sub {$returnText .= shi +ft},'dtext']) || die "Unable to create an object of HTML::Parser :: $!"; ##Parse the html file $objParser->parse_file($sourceFilePath); print $returnText;
    Hope this helps