Text to HTML convert.

SiGiN has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Text to HTML convert. by jeffa (Bishop) on Oct 07, 2004 at 14:56 UTC
Well, there is always HTML::FromText. Should this be enough for you, you can always run the output through HTML::Tidy. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply]
Re^2: Text to HTML convert. by FoxtrotUniform (Prior) on Oct 07, 2004 at 18:58 UTC
Well, there is always HTML::FromText. See also: `<self-promotion type="shameless">` HTML::FromText (review) Automating a Static Website HTML::FromText patch `</self-promotion>` `-- F o x t r o t U n i f o r m Found a typo in this node? /msg me`	[reply] [d/l] [select]
Re^2: Text to HTML convert. by SiGiN (Initiate) on Oct 07, 2004 at 17:29 UTC
HTML::FromText totally wrecks cyrillic letters =/. Seems, like I'll do it rather myself Will keep You informed Update: I managed to make it this way: `use HTML::FromText; use HTML::Entities; my $text=shift; my $t2h = HTML::FromText->new({ paras => 0, blockcode => 0, tables => 0, bullets => 0, numbers => 0, urls => 1, email => 1, bold => 0, underline => 0, metachars => 0, }); $text=encode_entities($text, '<>&"'); my $html=$t2h->parse($text);` [download]	[reply] [d/l]
Re^3: Text to HTML convert. by tachyon (Chancellor) on Oct 08, 2004 at 03:06 UTC
Something like this is probably all you need sub get_text { my ($text, $cols) = @_; $cols \|\|= 80; require Text::Wrap; $Text::Wrap::columns = $cols; # tabs to 4 spaces $text =~ s/\t/ /g; # now wrap it $text = Text::Wrap::wrap('','',$text); # iterate over chunks, escaping HTML and linking $text =~ s{(\S+)} { local $_ = $1; m/^(?:http\|ftp)/ ? qq!<a href="$_">!.escapeHTML($_).qq +!</a>! : m/^www\./ ? qq!<a href="http://$_">!.escapeHTML +($_).qq!</a>! : m/^[^@]+@[^@]/ ? qq!<a href="mailto:$_">!.escapeHTML +($_).qq!</a>! : escapeHTML($_); }ge; # fix whitespace (and newlines if not using pre tags) $text =~ s/( {2,})/" " x length $1/eg; #$text =~ s/\n/<br>\n/g; # wrap in pre tags $text = '<pre>' . $text . '</pre>'; return $text; } sub escapeHTML { my ( $escape ) = @_; return '' unless defined $escape ; $escape =~ s/&/&/g; $escape =~ s/"/"/g; $escape =~ s/</</g; $escape =~ s/>/>/g; $escape =~ s/([^\000-\177])/'&#' . (sprintf "%3d", ord $1) . ';'/e +g; return $escape; } [download] cheers tachyon	[reply] [d/l]
Re: Text to HTML convert. by Grygonos (Chaplain) on Oct 07, 2004 at 14:30 UTC
I don't have any pre-dev'd solution for you.. however may I reccomend the following (just off the top of my head) `use strict; use warnings; #Just listing a few bad chars (I think hehe) for example my %escap_chars = ( '>' => '>', '<' => '<}; #Text for email would either be in array or scalar I'll presume my $email_text = 'Hi this is a math email 2 + 2 = 4. and 4 > 5'; #if scalar foreach my $key (keys(%escape_chars)) { $email_text =~ s/$key/$escape_chars{$key}/g; } # if each line of email is in an array foreach my $key (keys(%escape_chars)) { s/$key/$escape_chars{$key}/g for(@email_text); }` [download] Just as a method for dropping characters that need escaping and changing to their HTML counterpart. I believe (don't know for sure obviously) that this is something like what you would need to do. I believe there is a module that has a http link regex that you can use. Also just as a feature thought. you could test the links you find to make sure they are valid (read: not broken)links.. Grygonos	[reply] [d/l]
Re^2: Text to HTML convert. by muntfish (Chaplain) on Oct 07, 2004 at 15:07 UTC
You also need to escape the & (ampersand) character, as this is also a reserved char in HTML. I dug this code out of one of my old scripts. I'm not saying it's perfect, by any means. In particular it can treat things as URLs when they are not well formed. `while(<F>) { s/&/&/g; s/</</g; s/>/>/g; s!(https?://[-~@=_%;&/\+\.\?a-zA-Z0-9]+)!<a href="$1">$1</a>!g; s!([-_\+\.a-zA-Z0-9]+@[-_\+\.a-zA-Z0-9]+)!<a href="mailto:$1">$1<\ +/a>!g; # then print $_, or whatever you want to do with it }` [download] But I hope it helps you get started. Other monks are (as always) welcome to suggest improvements or point out flaws! Update: The above assumes the whole output is wrapped in `<pre>` tags, otherwise original paragraph/line formatting won't be preserved. Update 2: Actually, the CPAN module HTML::FromText as suggested by jeffa looks like a better idea. I'll have to look into it myself... s^^unp(;75N=&9I<V@`ack(u,^;s\|$.+\`\|"$`$'\"$&\"$"\|ee;/m.+h/&&print$&	[reply] [d/l] [select]
Re^3: Text to HTML convert. by jeffa (Bishop) on Oct 07, 2004 at 16:50 UTC
Use the `encode_entities()` method from HTML::Entities. :) jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply]
Re: Text to HTML convert. by Intrepid (Curate) on Oct 07, 2004 at 19:57 UTC
I've had a lot of fun with HTML::TextToHTML. It is the most capable and configurable of any HTML-from-text generator I've yet seen, and I've looked at many of them. HTH. Soren A / somian / perlspinr / Intrepid -- use PerlMonk::Tye qw(:wisely);	[reply]
Re: Text to HTML convert. by swaroop.m (Monk) on Oct 08, 2004 at 09:42 UTC
Hi Use HTML::Parser module. `##Use the module HTML::Parser use HTML::Parser; my $returnText ; my $objParser; ##Take the function parameters into local variables my( $sourceFilePath) = "1.html"; ##Mention that the parsed file should be converted to text file $objParser = HTML::Parser->new(text_h => [ sub {$returnText .= shi +ft},'dtext']) \|\| die "Unable to create an object of HTML::Parser :: $!"; ##Parse the html file $objParser->parse_file($sourceFilePath); print $returnText;` [download] Hope this helps	[reply] [d/l]