in reply to HTML::Parser API script or module

I would take a serious look at HTML::Tree (which is composed of HTML::TreeBuilder and HTML::Element). It runs on top of HTML::Parser and has lots of great functions that can do exactly what your are looking for, as well as lots of other tree-walking and manipulating magic.

For example, you could do something like this (untested):

use HTML::Tree; my $tree = HTML::Tree->new_from_file( "myfile.html"); my $html_object = $tree->look_down("_tag", "x", # find an x element + "y", qr/z/); # that has a y attribut +e matching z
IIRC, you can do the same thing in list contest and get a list of all the <y></y> elements in the file, or in a particular leaf of the tree.

Then you can pull whatever you want out (ie. other attributes) of your $html_object, see it as text:

my $text = $html_object->as_text;
or as raw HTML:
my $html = $html_object->as_HTML;

There is a bit of a learning curve (or at least there was for me), in that you have to get used to thinking of your document as a tree, and not as a text per se. But once you've get it, you can do a lot of things very cleanly.