Weird situation...

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello brother monks...

I have a series of lines throughout some generated output which come in the form of:

</a>
<!-- rem_me --></li>
<li>
[download]

I planned on replacing all instances of the middle line:

</li>

with <ul>

One would think that in my situation this would be a simple case of:

$html =~ s/<\/li>/<ul>/sig;

But strangely this didn't work... A quick search with a hex editor shows nothing out of the ordinary... There are two 0A characters on either side of the </li> line, which can only be attributed to the \n characters...

So what am I missing here?

Regards,

Fib Jones

Comment on Weird situation... Select or Download Code

Replies are listed 'Best First'.
Re: Weird situation... by GrandFather (Saint) on Mar 19, 2008 at 23:42 UTC
The general answer is "Parsing HTML is hard. Use a tool for it." Have a look on CPAN, there are plenty of HTML modules there. HTML::Sanitizer or HTML::Parser is likely most useful in this case. Perl is environmentally friendly - it saves trees	[reply]
Re: Weird situation... by ww (Archbishop) on Mar 20, 2008 at 02:27 UTC
Just checking to make sure you REALLY want to do that: Opening a new `<ul>` as a replacement for the `comment+</li>` will result in a nested list... which will be doubly indented and which will require an additional `</ul>` at some point. If you're dealing with that, separately, then you'll be fine, but if not, your results may not be what you expect, and your html will surely be ill-formed/non-compliant.	[reply] [d/l] [select]
Re: Weird situation... by fibonacci_jones (Initiate) on Mar 19, 2008 at 23:23 UTC
Let me try this again... the code didn't show up properly!!!! Hello brother monks... I have a series of lines throughout some generated output which come in the form of: `</a> <!-- rem_me --></li> <li>` [download] I planned on replacing all instances of the middle line: `<!-- rem_me --></li>` [download] with `<ul>` One would think that in my situation this would be a simple case of: `$html =~ s/<!-- rem_me --><\/li>/<ul>/sig;` But strangely this didn't work... A quick search with a hex editor shows nothing out of the ordinary... There are two 0A characters on either side of the line, which can only be attributed to the \n characters... So what am I missing here? Regards, Fib Jones	[reply] [d/l] [select]
Re^2: Weird situation... by ikegami (Patriarch) on Mar 19, 2008 at 23:32 UTC
Works for me, as I would expect it to. `my $html = do { local $/ = undef; <DATA> }; $html =~ s/<!-- rem_me --><\/li>/<ul>/sig; print($html); __DATA__ </a> <!-- rem_me --></li> <li>` [download] `</a> <ul> <li>` [download] What's `rem_me` in reality?	[reply] [d/l] [select]
Re^3: Weird situation... by fibonacci_jones (Initiate) on Mar 19, 2008 at 23:38 UTC
It (`<!-- rem_me -->`) was a marker for the generated output... not all instances of: `</a> </li> <li>` [download] were in need of being changed... so this marker was created in the hope that the above replacement would pick it up... Why wouldn't it work on my end? There's about 5 different instances, and there's no way I can just hardcode this directly into the page as the output is dynamic.	[reply] [d/l] [select]
Re^4: Weird situation... by ikegami (Patriarch) on Mar 20, 2008 at 01:40 UTC
Re^5: Weird situation... by Anonymous Monk on Mar 20, 2008 at 09:22 UTC