in reply to A grammar for HTML matching
So far it sounds like people aren't interested in this idea. FWIW, I would make good use
of something like this; as it stands now I have (hangs his head) a few shell scripts doing
something similar for me (it was before I knew that 'Perl Syntax' wasn't an oxymoron; I
had been led to believe that random ASCII strings were invariably valid perl code). Three
days ago one of the sites changed their site design (they seem to have intentionally
broken the strict hierarchy I was counting on), and I have yet to get around to figuring
out what their new layout is. Having some standardized syntax for "find FOO, then parse out
everything until BAZ" would solve this, and if done right would even survive all but the
most severe site redesigns.
While I see the objections to departing from HTML::Parser, I agree with mcelrath that using a regex to skip past 85K to the 4K of text that you actually want (in many cases a simple grep is all that's needed), would be a Good Thing. If HTML::Parser ends up being part of the solution (if only after the desired portion of the document is reached), then so be it.
Having voiced his support for mcelrath's idea, brainpan steps down from his soapbox.
While I see the objections to departing from HTML::Parser, I agree with mcelrath that using a regex to skip past 85K to the 4K of text that you actually want (in many cases a simple grep is all that's needed), would be a Good Thing. If HTML::Parser ends up being part of the solution (if only after the desired portion of the document is reached), then so be it.
Having voiced his support for mcelrath's idea, brainpan steps down from his soapbox.
|
---|
In Section
Seekers of Perl Wisdom