Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

ASCII to HTML

by skazat (Chaplain)
on Apr 04, 2000 at 23:04 UTC ( [id://6866]=perlquestion: print w/replies, xml ) Need Help??

skazat has asked for the wisdom of the Perl Monks concerning the following question:

heya, i'm looking for a cool little regex to change some info from a <textarea>(in a $scalar) into html, maybe a bit more complex than
$ascii =~ s/\n\n/<\/p><p>/g;
as thats not doing to well of a job. it seems every gosh darn newline gets its own paragraph and it looks a bit hokey. anyone got a good one?

Replies are listed 'Best First'.
Re: ASCII to HTML
by btrott (Parson) on Apr 04, 2000 at 23:16 UTC
    I'm not sure why that's not working for you, actually... it looks like it would, from what I can tell, and from the little test I did.

    But anyway, depending on how complicated your text is, you may need something more powerful than just a regex. Take a look at HTML::FromText, which formats your text into HTML. It can handle a lot more formatting issues than just paragraphs.

    use HTML::FromText; my $str = <<TEXT; Foo is on this line, and bar is in this paragraph. Baz is in a new paragraph. TEXT print text2html($str, paras => 1);
    The result:
    <P>Foo is on this line, and bar is in this paragraph.</P> <P>Baz is in a new paragraph.</P>
    If you decide to go with a regex, I've always just used
    $str =~ s/\n\n/<p>\n\n/g; $str =~ s/\n/<br>\n/g;
    which first replaces double-newlines and makes them paragraphs, and then formats line breaks. Plus it keeps the newlines there as a visual distinction, in case anyone actually needs to *read* the HTML. :)
      Whoa! That second regex will Do Weird Things after the first. (It picks up the newlines after the fresh

      tag). Here's another approach:

      my $str = TEXT; Foo is on this line, and bar is in this paragraph. Baz is in a new paragraph. TEXT my @para = split(/\n\n/, $str); s!\n!<br>\n! foreach @para; $str = join "\n<p>\n", @para; print ">>$str<<\n";
      For extra credit, put that in a one-liner. *sigh*
        Oops oops oops. Thanks for catching that--I guess I must not always use that regex. :)

        But here's your one-liner:

        $str = join "\n<p>\n", grep s/\n/<br>\n/g || 1, split /\n\n/, $str;
        (The "|| 1" in there makes it so that even paragraphs that don't contain any carriage returns inside of them, and thus don't match in the substitution, still get included in the final list of paragraphs.)
      i to have been using this: $str =~ s/\n\n/

      /g; to format text going into a flat file db. obviously to keep a new line from being written to the db that would cause the entry to skip. but strange thing, it keeps the entry from going to a new line like i wanted yet it puts a "square (new line character)" before the "

      " in the entry. for the life of me, i cant figure out why. any suggestions? thanks much. tdidy

Re: ASCII to HTML
by chromatic (Archbishop) on Apr 04, 2000 at 23:18 UTC
    Most of the regexes I have seen do something like this:
    $ascii =~ s/\n\n/<p>/g; $ascii =~ s/\n/<br>/g;
Re: ASCII to HTML
by little_mistress (Monk) on Apr 05, 2000 at 01:58 UTC
    Well both examples create an odd thing:
    <br> FOO<br> BAR<p><br> <br> BAZ<p><br> <br>

    notice the extra break tags? where as a set of expressions will do the job like this:

    $text = <<TEXT; FOO BAR BAZ TEXT $text =~s/\n{2,2}/<p>/g; $text =~s/\n{1,1}/<br>/g; $text =~ s/</\n</g; $text =~ s/>/>\n/g; print $text;

    gives us this result

    FOO <br> BAR <p> BAZ <br>
    which i think is a little more of what you are looking for. I would suggest looking at the Regular Expressions book by O'Reilly. I left out some optimizations so you could go hunting for them.

    take care

    little_mistress@mainhall.com

Re: ASCII to HTML
by turnstep (Parson) on Apr 05, 2000 at 02:16 UTC

    Don't forget the classic

    $ascii = "<PRE>$ascii</PRE>";

    While not technically HTML, it will probably emulate your textarea (including tabs!) better than any regex will. :)

      You just need to be careful on this one how you set up the textarea... if the textarea wraps the text automatically the browser doesn't necessarily insert linebreaks for you.
      So your users may type in an entire story with absolutely no linebreaks, and the <pre> tags will force it all on one line. In this case it's best to turn wrapping off in the textarea, so the output will be a faithful reproduction of what the user typed.

        Excellent point. I have a script that uses PRE tags and TEXTAREA input, and I have to manually wrap the long lines using some code that makes a good guess at where to insert a break if the line is too long. Why some people don't hit the 'Enter' key once in a while* is beyond me! Just goes to show you always have to check for all possible cases when dealing with user input.

        I've gotten people who put over 1000 characters on one line. Sheesh!

      Good Point!
RE: ASCII to HTML
by Anonymous Monk on Apr 05, 2000 at 09:47 UTC
    Do you really need a regex to do the subs? Will the
     tag meet your requirements?
      Hmm... forgot to convert the HTML tags that was suppose to be a <pre> tag

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://6866]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-03-28 21:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found