in reply to Re: Re: kill all lines that don't start with something
in thread kill all lines that don't start with something

I understand your point, but the following will break your regex:

<TD class="foo"> # you don't allow for attributes <td> # you assumed upper-case <TD # it's annoying, but legal, to have a newline there >

If the last example seems contrived, I can assure you that it's not. I've had the misfortune of dealing with HTML written like that :) Further, that's the example which pretty much guarantees that no tweaks to your regex will handle that case. Sad, but true.

If it makes you feel any better, you can get an idea of the scope of the problem of using regular expressions with HTML by reading about my sordid history making the same darned mistake.

Cheers,
Ovid

Update: chicks has updated the original code snippet so that my comments and those of Mr. Muskrat don't appear to make sense. I think it would have been appropriate for chicks to make note of that. The original snippet resembled the following (I can't recall it exactly):

$content =~ s/^(?!\s*<TD>).*$//mg;

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: Re: Re: Re: kill all lines that don't start with something
by chicks (Scribe) on May 10, 2002 at 20:06 UTC
    The point of my post was not meant to be HTML related in any way. I've been down those roads too. I'm working with HTML generated by a database and it's very consistant. I can also assure you that the work involved in doing it with the "proper" HTML tools would have far outweighed throwing a handful of regexes at it.