in reply to Slow regexp

As the others already pointed out, HTML::Parser or one of its subclasses is the way to go. Here i use HTML::Toke parser, saves you some work.
The code below is not equiv. to yours but something to get you started.
use strict; use HTML::TokeParser; use Data::Dumper; my %tokens; my %tokencount; my $p = HTML::TokeParser->new("test.html") or die "Can't open: $!"; while (my $token = $p->get_token) { if ( $token->[0] eq "S" ) { $tokens{$token->[1]}++ unless $token->[1] =~ /meta/i; } elsif ( $token->[0] eq "E" ) { $tokens{$token->[1]}--; } elsif ( $token->[0] eq "T" ) { my @words = ( $token->[1] =~ /\b(\w+)/g ); for ( keys %tokens ) { $tokencount{$_} += @words if $tokens{$_} > 0; } } } print Dumper (\%tokencount);
When test.html looks like
<html lang='en-US'> <head> <title>Stuff</title> <meta name='author' content='Jojo' /> </head> <body> <h2>I like potatoes!</h2> <h1>Me not!</h1> </body> </html>
it will print
$VAR1 = { 'h1' => 2, 'body' => 5, 'head' => 1, 'html' => 6, 'title' => 1, 'h2' => 3 };

holli, regexed monk