I'm reposting my response, since the question had been originally posted to Perl Monks Discussion

I'm not a seasoned user of HTML::Parser, but I believe it calls a function for each opening and closing tag it encounters, and for each piece of text between tags. If that's the case, you can set special flags when you encounter certain opening tags, and then store all the text in a variable until you encounter the corresponding closing tag, at which point you can store the text wherever you want. Using the HTML::Parser version 2 subclassing, something like this: (untested code, based on sample code from the HTML::Parser documentation)

{ package MyParser; use base 'HTML::Parser'; sub start { my($self, $tagname, $attr, $attrseq, $origtext) = @_; if($tagname eq 'blockquote') { $capturing{blockquote}=1; $text{blockquote}=""; } } sub end { my($self, $tagname, $origtext) = @_; $capturing{blockquote}=0 if $tagname eq 'blockquote'; # Do whatever you want to do with $text{blockquote} } sub text { my($self, $origtext, $is_cdata) = @_; $text{blockquote}.=$origtext if $capturing{blockquote}; } } my $p = MyParser->new; $p->parse_file("foo.html");
This will capture all the text between BLOCKQUOTE tags. Of course, you can do more complex rules for capturing what you want and storing it where you want it, but the general idea should be the same.

--ZZamboni


In reply to (ZZamboni) Re: I need help with some logic by ZZamboni
in thread I need help with some logic. by shaba

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.