Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Re: kill all lines that don't start with something

by chicks (Scribe)
on May 10, 2002 at 17:54 UTC ( [id://165732]=note: print w/replies, xml ) Need Help??


in reply to Re: kill all lines that don't start with something
in thread kill all lines that don't start with something

I'm quite fond of that entire set of modules, but in this case I was expanding on the functionality of a sed script so keeping with the search and replace model fit quite nicely with the rest of the program.
  • Comment on Re: Re: kill all lines that don't start with something

Replies are listed 'Best First'.
Re: Re: Re: kill all lines that don't start with something
by Ovid (Cardinal) on May 10, 2002 at 19:58 UTC

    I understand your point, but the following will break your regex:

    <TD class="foo"> # you don't allow for attributes <td> # you assumed upper-case <TD # it's annoying, but legal, to have a newline there >

    If the last example seems contrived, I can assure you that it's not. I've had the misfortune of dealing with HTML written like that :) Further, that's the example which pretty much guarantees that no tweaks to your regex will handle that case. Sad, but true.

    If it makes you feel any better, you can get an idea of the scope of the problem of using regular expressions with HTML by reading about my sordid history making the same darned mistake.

    Cheers,
    Ovid

    Update: chicks has updated the original code snippet so that my comments and those of Mr. Muskrat don't appear to make sense. I think it would have been appropriate for chicks to make note of that. The original snippet resembled the following (I can't recall it exactly):

    $content =~ s/^(?!\s*<TD>).*$//mg;

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      The point of my post was not meant to be HTML related in any way. I've been down those roads too. I'm working with HTML generated by a database and it's very consistant. I can also assure you that the work involved in doing it with the "proper" HTML tools would have far outweighed throwing a handful of regexes at it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://165732]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2024-04-19 14:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found