Re: regex to replace linefeeds with <p> tags
by liverpole (Monsignor) on Dec 25, 2006 at 21:33 UTC
|
Hi jck,
One simple method of fixing it immediately comes to mind.
Since it's HTML, why not just skip putting the newline before and after the </p> ... <p>:
$in{$_} =~ s/([\r\n]){2,}/<\/p><p>/g;
That way, you at least avoid the newline accumulation problem.
Update: If you really have your heart set on putting them on the same line, something like:
$in{$_} =~ s/(?!^<\/p><p>)([\r\n]){2,}/\n<\/p><p>\n/g;
Might do the trick. It uses a zero-width negative lookbehind assertion, which avoids adding </p> ... <p> to any line which contains that pair (and only that pair) already.
s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
| [reply] [d/l] [select] |
Re: regex to replace linefeeds with <p> tags
by ysth (Canon) on Dec 25, 2006 at 21:35 UTC
|
Your [\r\n] is suspicious. Does the data have \r or \n or \r\n?
If the latter, you don't want a character class, since that matches even a single "\r\n" when presumably you mean it to match only "\r\n\r\n...".
You might try stripping out all \r's before doing the regex and using \n as the only line ending, with your regex looking for just \n{2,}
| [reply] [d/l] [select] |
|
|
Those are good suggestions, but I would do this:
s/(\r?\n){2,}/\n<\/p>\n<p>\n/g
to make the \r optional, instead of doing 2 passes over the data. (and I added a \n in the replacement to make the results look a little nicer in my eyes)
| [reply] [d/l] |
|
|
I'll second Joost's approach, but make a few suggestions for readability / maintainability.
- Use qw so to de-clutter the list of strings.
- Use an explicit variable in the for; they are cheap and they make your intention clear.
- Use [ ] braces for the the regex separator so you won't have to backslash the slash. This de-emphasizes some of the executable line noise effect
- Use the regex x modifier to put some whitespace and comments in here.
foreach my field (qw(postby title teaser content)){
$in{$field} =~ s[ (\r? \n){2,} ] # two or more CR
[ \n </p> \n <p> \n]gx; # Close one para, open ano
+ther
}
throop
| [reply] [d/l] |
|
|
|
|
I didn't suggest that because then you are (assuming the data was consistent in the first place) leaving most lineends as "\r\n" but those at paragraph breaks as "\n", and that bothered me.
| [reply] |
|
|
| [reply] |
|
|
But the behaviour you describe indicates that the user input is coming back as the sequence \r\n for a single line break.
| [reply] |
|
|
Re: regex to replace linefeeds with <p> tags
by jck (Scribe) on Dec 26, 2006 at 02:29 UTC
|
thanks to all for the great suggestions. they're all very helpful.
liverpole, i agree with you, and i was thinking that i would just leave out the linefeeds, but when the posts are long, i don't like seeing the text all strung together without easily seeing the paragraph breaks - just a preference thing.
a general question about the \r ? that both Joost and throop suggest......i started out with \n{2,} but found that some of my users were cutting an pasting from word processors, and that introduced the occasional \r into the mix. so, will [ (\r? \n){2,} ] match to "\r\r" ? that was what i was hoping would work with the \r\n{2,} - that it would match to \r\r or \r\n or \n\r or \n\n (as well as \r\r\n and \r\n\r and \r\n\n and \n\n\r etc etc etc.....)
clearly, passing through twice, and changing any \r to \n and then matching the \n{2,} to replace to the para tags would be reliable, but seems inefficient. | [reply] [d/l] [select] |
|
|
Pasting from Windows environments will introduce \r\n because that's what Microsoft uses for linebreaks. It won't introduce \n\r or \r\r.
You don't want to introduce a <p> from a single <RETURN>, right? \r?\n is what you want to match.
If it's clearer to you, go ahead and remove all the \r in one pass and then handle the \n. Don't worry about efficiency here — you're doing IO!. The number of CPU cycles it takes to get a response from the keyboard to the CPU is enormous in comparison to the cycles to do a string replace. throop
| [reply] [d/l] [select] |
Re: regex to replace linefeeds with <p> tags
by j3 (Friar) on Dec 26, 2006 at 17:06 UTC
|
Hi jck,
Might not be too relevant here, but note that, in general, if you need to convert text to html you might have a look at Markdown. There's even a Text::Markdown module for it.
| [reply] |
Re: regex to replace linefeeds with <p> tags
by f00li5h (Chaplain) on Dec 27, 2006 at 01:44 UTC
|
The Template toolkit filter html_para may also help you. It wraps <p> tags arround paragraphs (delimted by a blank line).
Merlyn will tell you how to use Tempate in
This article, one of his spiffy Linux Magazine Columns.
@_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;
| [reply] [d/l] |