I would use it. I believe what he wants to do is match a code block and extract the content. I have written several small scripts that run via cron to collect some information and file it away. I ended up using two different methods to grab what I wanted.

The first was just to scan the html looking for a comment line and grabbing most everything after it. That was the easy one.

The second site was more complicated and the data I was trying to extract was in a large table that changed size depending on what they were displaying. I didn't feel like learning html::parser at the time and I hadn't found html::tableextract either. I cheated and piped the page through lynx and grabbed what I wanted from the parsed text output.

So neither of those methods would help you :-) but if you put something like this together I would use it. I still have to take a look at html::tableextract, but I'll get around to it.

I've seen a few packages on freshmeat.net that will snag comic strips off the web and put them somewhere for you. They might have some good techniques for extracting that stuff.

HTH

In reply to Re: A grammar for HTML matching by elwarren
in thread A grammar for HTML matching by mcelrath

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.