in reply to regex hangs forever

To add to what grep stated, with HTML you should look at the helper tools in CPAN to go through the HTML. One good one is http::parser

Reasons:

  • More than one person has read the code, they most likely thought of more items that could appear on the html page
  • the overhead is higher, but your not re-inventing the wheel.
  • What if you want to search for more than one thing in a week? In a year? Do you want to rewrite your search?
  • Did I mention re-invent the wheel thing?
    --
    Even smart people are dumb in most things...
  • Replies are listed 'Best First'.
    Re^2: regex hangs forever
    by jettero (Monsignor) on Dec 15, 2006 at 13:55 UTC
      I have a related question. I am coincidentally looking for a way to read table rows from the content of an HTTP::Response object. I was going to fire up a touch of the HTML::Parser, but I was first going to look for a built in. Any tips?

      UPDATE: I had solved my question above once before, but coudln't find the code in my massive code-scrapbook. What I wanted was HTML::TreeBuilder. It is the bomb.

      -Paul

        If you are looking to get data from HTML tables, HTML::TableExtract can often do it with much less work than using something like HTML::Parser or HTML::TreeBuilder directly, especially if the tables are complex...


        We're not surrounded, we're in a target-rich environment!