(crazyinsomniac) Re: HTML::Parser - getting all contained HTML?

but as a side effect, I lose all HTML tags between the DIVs

??? HTML::Parser "tokenizes" all input (for the most part), and calls appropriate handlers.

What happens to the "html" is that it gets parsed, turned into tokens, passeed as arguments to the handlers...

To preserve the html, you have to recreate it out of the tokens, and store it someplace...

Looking at you code snippet, and what you're trying to do, it looks like you would be better off using HTML::TokeParser (an alternative interface to HTML::Parser, where you don't setup "handlers" which process the data automatically, but you "pull" tokens out of the data, and are able to "seek" back and forth through the file).

There is a tutorial, incidentally by me, in the Tutorials section, aptly named, HTML::TokeParser Tutorial. I suggest you also take a look at the XML::Parser Tutorial, as it seems you'd also be better off using proper XML to store your "data" (btw - the HTML::Parser and XML::Parser interfaces are very very similar - only few "name" changes ;D).

___crazyinsomniac_______________________________________
Disclaimer: Don't blame. It came from inside the void
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

Comment on (crazyinsomniac) Re: HTML::Parser - getting all contained HTML?

Replies are listed 'Best First'.
Re: (crazyinsomniac) Re: HTML::Parser - getting all contained HTML? by Anonymous Monk on Sep 20, 2001 at 03:32 UTC
Having had a good hard look at the docs, I figured it out using HTML::Parser directly, and (as you said) adding all the handlers, and reconstructing the appropriate parts of the document. I know I'd be better off with XML, but this was a one-off conversion from the HTML docs into something more 'edible'. Next time I have something like this, I'll try TokeParser.	[reply]