in reply to Parsing HTML files to recover data...
I think you should go for Template::Extract http://search.cpan.org/dist/Template-Extract/ which is really great, it uses the Template syntax to reconstruct data structure from templates using your datafiles as source.
Basically what it does is it does the opposite job of the Template module.
I use it to extraxt information from emails.
Great module, many thank to its author :)