but as a side effect, I lose all HTML tags between the DIVs
??? HTML::Parser "tokenizes" all input (for the most part), and calls appropriate handlers.

What happens to the "html" is that it gets parsed, turned into tokens, passeed as arguments to the handlers...

To preserve the html, you have to recreate it out of the tokens, and store it someplace...

Looking at you code snippet, and what you're trying to do, it looks like you would be better off using HTML::TokeParser (an alternative interface to HTML::Parser, where you don't setup "handlers" which process the data automatically, but you "pull" tokens out of the data, and are able to "seek" back and forth through the file).

There is a tutorial, incidentally by me, in the Tutorials section, aptly named, HTML::TokeParser Tutorial. I suggest you also take a look at the XML::Parser Tutorial, as it seems you'd also be better off using proper XML to store your "data" (btw - the HTML::Parser and XML::Parser interfaces are very very similar - only few "name" changes ;D).

 
___crazyinsomniac_______________________________________
Disclaimer: Don't blame. It came from inside the void

perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"


In reply to (crazyinsomniac) Re: HTML::Parser - getting all contained HTML? by crazyinsomniac
in thread HTML::Parser - getting all contained HTML? by howie

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.