I have a scalar that contains a file. I clean it up using a few regex's and then I want to get rid of anything that doesn't start with a given few characters. Here it is.
$content =~ s/^(?!KEEP).*$//mg; # clear lines not starting with KEE +P

Replies are listed 'Best First'.
Re: kill all lines that don't start with something
by Mr. Muskrat (Canon) on May 10, 2002 at 17:12 UTC
      I'm quite fond of that entire set of modules, but in this case I was expanding on the functionality of a sed script so keeping with the search and replace model fit quite nicely with the rest of the program.

        I understand your point, but the following will break your regex:

        <TD class="foo"> # you don't allow for attributes <td> # you assumed upper-case <TD # it's annoying, but legal, to have a newline there >

        If the last example seems contrived, I can assure you that it's not. I've had the misfortune of dealing with HTML written like that :) Further, that's the example which pretty much guarantees that no tweaks to your regex will handle that case. Sad, but true.

        If it makes you feel any better, you can get an idea of the scope of the problem of using regular expressions with HTML by reading about my sordid history making the same darned mistake.

        Cheers,
        Ovid

        Update: chicks has updated the original code snippet so that my comments and those of Mr. Muskrat don't appear to make sense. I think it would have been appropriate for chicks to make note of that. The original snippet resembled the following (I can't recall it exactly):

        $content =~ s/^(?!\s*<TD>).*$//mg;

        Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.