Re: Substitution inside tags, as 1 line
by Narveson (Chaplain) on Oct 14, 2008 at 04:38 UTC
|
Purists are cringing at your apparent belief that <p> marks the end of a paragraph. It marks the beginning of a paragraph, which is then terminated by </p>. Your confusion is widespread and pardonable, because the terminal </p> is optional, and your orphan line at the beginning will usually be rendered exactly like a paragraph.
So here's how to do what you are trying to do:
s/(<pre>\n(?:[^\n]*<p>\n)*)([^>\n]*)\n(.*?<\/pre>)/$1$2<p>\n$3/ms
This assumes, as you do, that the opening <pre> is on a line of its own. I further assume that you start with no markup of any kind in your <pre> block. The substitution puts <p> at the end of each line that doesn't yet contain markup.
I think my attempt may be the kind of thing you're looking for, but you may find further problems with this approach. Before you spend too much more time on this regex, I'd advise you to either process the file line-by-line (as you're already thinking of doing), or better yet, drop regexes altogether and learn about parsers.
| [reply] [d/l] [select] |
|
|
Both m and s options on s///?
e Evaluate the right side as an expression.
g Replace globally, i.e., all occurrences.
i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
| [reply] [d/l] |
|
|
From Perl Programming, 3rd Edition, by Larry Wall, etc, P153.
/m Let ^ and $ match next to embedded \n.
/s Let . match newlines and ignore deprecated $*.
| [reply] [d/l] |
|
|
perl -0 -pe '1 while (s/(<pre>\n(?:[^\n]*<p>\n)*)([^>\n]*)\n(.*?<\/pre
+>)/$1$2<p>\n$3/ms);s/<\/?pre>//g' htmlfile
? | [reply] [d/l] |
|
|
My full answer would be:
Perhaps you'll manage to get this to work, but really, regexes, wonderful as they are, are the wrong tool here. I offered a bit of code in the spirit of "Don't you see how hairy this is going to have to be?"
Parse your HTML. wfsp has been kind enough to furnish details.
| [reply] |
Re: Substitution inside tags, as 1 line
by NetWallah (Canon) on Oct 14, 2008 at 06:30 UTC
|
Try using the flip-flop operator:
perl -pe 'm|<pre>|...m|</pre>| and $_.="<p/>"' < your-html-file
Output :
<html>
...etc...
<pre>
<p/>Line 1
<p/>Line 2
<p/>...etc...
<p/>Line n
<p/></pre>
<p/>...etc...
</html>
Have you been high today? I see the nuns are gay! My brother yelled to me...I love you inside Ed - Benny Lava, by Buffalax
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
Slightly uglier, getting rid of the extra </p> around the <PRE>, but still readable:
perl -pe 'm|<pre>|..m|</pre>| and {m|</?pre>| or $_=qq|<p>$_</p>|} ' <
+ Your-file
Also removed the unnecessary empty para (per mortiz, and kept it XHTML-compatible !
Have you been high today? I see the nuns are gay! My brother yelled to me...I love you inside Ed - Benny Lava, by Buffalax
| [reply] [d/l] [select] |
|
|
Re: Substitution inside tags, as 1 line
by wfsp (Abbot) on Oct 14, 2008 at 11:41 UTC
|
...as 1 line.
How about 44? :-)
This uses a parser to get the data you need and a template to put it all back together again.
Over the top? Possibly. I have a particular aversion to having any HTML in my code, even more so in a regex. It almost always ends in tears. This way I have no HTML in the code at all (the source and the template would normally be in separate files). YMMV.
#!/usr/local/bin/perl
use strict;
use warnings;
use HTML::TokeParser::Simple;
use HTML::Template;
my $p = HTML::TokeParser::Simple->new(\get_html());
my ($in_pre, $pre);
while (my $t = $p->get_token){
$in_pre++, next if $t->is_start_tag(q{pre});
next unless $in_pre;
last if $t->is_end_tag(q{pre});
$pre .= $t->as_is;
}
my @lines = grep{/\S/} split /\n/, $pre;
my $tmpl = HTML::Template->new(scalarref => \get_tmpl());
my @loop = map{{line => $_}} @lines;
$tmpl->param(loop => \@loop);
print $tmpl->output;
sub get_html{
return <<HTML;
<html>
<pre>
line 1
line 2
line 3
</pre>
</html>
HTML
}
sub get_tmpl{
return <<TMPL
<html>
<TMPL_LOOP loop>
<p><TMPL_VAR line></p>
</TMPL_LOOP>
</html>
TMPL
}
<html>
<p>line 1</p>
<p>line 2</p>
<p>line 3</p>
</html>
| [reply] [d/l] [select] |
|
|
Hi wfsp.
I finally made time to test your solution, and thanks very much for your input. Nice work! While I don't think my situation warrants using your code, I may well use it in future if I have a more complex problem to deal with, and I appreciate the time you took to demonstrate this method.
BTW: The single line processing requirement I gave was about the way I wanted to treat the htmlfile, rather than the number of lines of code.
Thanks again.
| [reply] |
Re: Substitution inside tags, as 1 line
by Perlbotics (Archbishop) on Oct 14, 2008 at 15:36 UTC
|
perl -pe 'chomp; $s=!$s,next if s/^\s*<\/?pre>\s*$//i; $_="<p>$_</p>"
+if $s; $_.="\n";' <in >out
... makes ...
in: out:
----------------------------------
<html> <html>
...etc... ...etc...
<pre> <p>Line 1</p>
Line 1 <p>Line 2</p>
Line 2 <p>...etc...</p>
...etc... <p>Line n</p>
Line n ...etc...
</pre> </html>
...etc...
</html>
| [reply] [d/l] [select] |
|
|
Weeks later...
I like it, Perlbotics.
Thanks for that.
| [reply] |