in reply to HTML::Parser API script or module
I would take a serious look at HTML::Tree (which is composed of HTML::TreeBuilder and HTML::Element). It runs on top of HTML::Parser and has lots of great functions that can do exactly what your are looking for, as well as lots of other tree-walking and manipulating magic.
For example, you could do something like this (untested):
IIRC, you can do the same thing in list contest and get a list of all the <y></y> elements in the file, or in a particular leaf of the tree.use HTML::Tree; my $tree = HTML::Tree->new_from_file( "myfile.html"); my $html_object = $tree->look_down("_tag", "x", # find an x element + "y", qr/z/); # that has a y attribut +e matching z
Then you can pull whatever you want out (ie. other attributes) of your $html_object, see it as text:
or as raw HTML:my $text = $html_object->as_text;
my $html = $html_object->as_HTML;
There is a bit of a learning curve (or at least there was for me), in that you have to get used to thinking of your document as a tree, and not as a text per se. But once you've get it, you can do a lot of things very cleanly.
|
|---|