in reply to A grammar for HTML matching

I would use it. I believe what he wants to do is match a code block and extract the content. I have written several small scripts that run via cron to collect some information and file it away. I ended up using two different methods to grab what I wanted.

The first was just to scan the html looking for a comment line and grabbing most everything after it. That was the easy one.

The second site was more complicated and the data I was trying to extract was in a large table that changed size depending on what they were displaying. I didn't feel like learning html::parser at the time and I hadn't found html::tableextract either. I cheated and piped the page through lynx and grabbed what I wanted from the parsed text output.

So neither of those methods would help you :-) but if you put something like this together I would use it. I still have to take a look at html::tableextract, but I'll get around to it.

I've seen a few packages on freshmeat.net that will snag comic strips off the web and put them somewhere for you. They might have some good techniques for extracting that stuff.

HTH