Re: Substitution inside tags, as 1 line

Purists are cringing at your apparent belief that  marks the end of a paragraph. It marks the beginning of a paragraph, which is then terminated by . Your confusion is widespread and pardonable, because the terminal  is optional, and your orphan line at the beginning will usually be rendered exactly like a paragraph.

So here's how to do what you are trying to do:

s/(<pre>\n(?:[^\n]*\n)*)([^>\n]*)\n(.*?<\/pre>)/$1$2\n$3/ms

This assumes, as you do, that the opening <pre> is on a line of its own. I further assume that you start with no markup of any kind in your <pre> block. The substitution puts  at the end of each line that doesn't yet contain markup.

I think my attempt may be the kind of thing you're looking for, but you may find further problems with this approach. Before you spend too much more time on this regex, I'd advise you to either process the file line-by-line (as you're already thinking of doing), or better yet, drop regexes altogether and learn about parsers.

Comment on Re: Substitution inside tags, as 1 line Select or Download Code

Replies are listed 'Best First'.
Re^2: Substitution inside tags, as 1 line by Anonymous Monk on Oct 14, 2008 at 07:14 UTC
Both m and s options on s///? `e Evaluate the right side as an expression. g Replace globally, i.e., all occurrences. i Do case-insensitive pattern matching. m Treat string as multiple lines. o Compile pattern only once. s Treat string as single line.` [download]	[reply] [d/l]
Re^3: Substitution inside tags, as 1 line by tel2 (Pilgrim) on Oct 14, 2008 at 09:01 UTC
From Perl Programming, 3rd Edition, by Larry Wall, etc, P153. `/m Let ^ and $ match next to embedded \n. /s Let . match newlines and ignore deprecated $*.` [download]	[reply] [d/l]
Re^2: Substitution inside tags, as 1 line by tel2 (Pilgrim) on Oct 14, 2008 at 08:44 UTC
Thanks Narveson, Nice work! So would your full answer be: - To put that in a while loop, and - Add the <pre> & </pre> tag removal code, like this: `perl -0 -pe '1 while (s/(<pre>\n(?:[^\n]<p>\n))([^>\n])\n(.?<\/pre +>)/$1$2<p>\n$3/ms);s/<\/?pre>//g' htmlfile` [download] ?	[reply] [d/l]
Re^3: Substitution inside tags, as 1 line by Narveson (Chaplain) on Oct 14, 2008 at 14:06 UTC
My full answer would be: Perhaps you'll manage to get this to work, but really, regexes, wonderful as they are, are the wrong tool here. I offered a bit of code in the spirit of "Don't you see how hairy this is going to have to be?" Parse your HTML. wfsp has been kind enough to furnish details.	[reply]