Re: A grammar for HTML matching

I would use it. I believe what he wants to do is match a code block and extract the content. I have written several small scripts that run via cron to collect some information and file it away. I ended up using two different methods to grab what I wanted.

The first was just to scan the html looking for a comment line and grabbing most everything after it. That was the easy one.

The second site was more complicated and the data I was trying to extract was in a large table that changed size depending on what they were displaying. I didn't feel like learning html::parser at the time and I hadn't found html::tableextract either. I cheated and piped the page through lynx and grabbed what I wanted from the parsed text output.

So neither of those methods would help you :-) but if you put something like this together I would use it. I still have to take a look at html::tableextract, but I'll get around to it.

I've seen a few packages on freshmeat.net that will snag comic strips off the web and put them somewhere for you. They might have some good techniques for extracting that stuff.

HTH

Comment on Re: A grammar for HTML matching