Have you considering filtering out exactly what you DON'T need and are sure of it (let's say first x bytes or up until a special tag) and then using
HTML::Parser on the remaining data? That way, you don't need to match any formats in your important search. In general, if you know the bounds of the useful information, you should be able throw away the rest and perform the searching on the relevant data. I certainly agree with you that your method would be most efficient albeit not very flexible. I really see two steps here. Of course,
Benchmarks may prove me wrong since
HTML::Parser isn't even Perl.
AgentM Systems nor Nasca Enterprises nor
Bone::Easy nor Macperl is responsible for the
comments made by
AgentM. Remember, you can build any logical system with NOR.