http://qs1969.pair.com?node_id=39395


in reply to A grammar for HTML matching

So far it sounds like people aren't interested in this idea. FWIW, I would make good use of something like this; as it stands now I have (hangs his head) a few shell scripts doing something similar for me (it was before I knew that 'Perl Syntax' wasn't an oxymoron; I had been led to believe that random ASCII strings were invariably valid perl code). Three days ago one of the sites changed their site design (they seem to have intentionally broken the strict hierarchy I was counting on), and I have yet to get around to figuring out what their new layout is. Having some standardized syntax for "find FOO, then parse out everything until BAZ" would solve this, and if done right would even survive all but the most severe site redesigns.

While I see the objections to departing from HTML::Parser, I agree with mcelrath that using a regex to skip past 85K to the 4K of text that you actually want (in many cases a simple grep is all that's needed), would be a Good Thing. If HTML::Parser ends up being part of the solution (if only after the desired portion of the document is reached), then so be it.

Having voiced his support for mcelrath's idea, brainpan steps down from his soapbox.