This code checks for missing opening or closing HTML tags. At the minute, it only works with simple tags like B and U etc. It could be easily extended to work with tags with more attributes etc with some regexps.
# Enter the tags to check for. Can only handle tags with # no attributes at the moment @tags = qw(b i u); # Put it in some HTML $html = "<b>Hello</b><u>Hello</i>"; # i.e. no </u> and <i> tag # Iterate through each tag foreach $tag (@tags) { # If $html contains that tag while (($html =~ /<$tag>/i) or ($html =~ /<\/$tag>/i)) { # If it contains the opening tag, keep count # of many opening tags there are. if ($html =~ /<$tag>/i) { $opening++; } # If it contains the closing tag, keep count # of many closing tags there are. if ($html =~ /<\/$tag>/i) { $closing++; } # Remove the tags for the next iteration $html =~ s/<$tag>//si; $html =~ s/<\/$tag>//si; } # If the number of opening tags and closing tags # don't match, then something is wrong if ($opening != $closing) { if ($opening > $closing) { # More opening than closing, therefore # missing closing print "Warning, missing </$tag> tag\n"; } else { print "Warning, missing <$tag> tag\n"; } } # Prepare for next iteration $opening = $closing = undef; }

Replies are listed 'Best First'.
RE: Missing HTML tags
by turnstep (Parson) on Apr 27, 2000 at 02:21 UTC
    Here's something to try out:
    $file = shift or die "Need a file name!\n"; @closedtags = qw(hmtl head body b i u); @opentags = qw(a img); open(HTMLFILE, "$file") || die "Could not open $file: $!\n"; undef $/; $html=<HTMLFILE>; close(HTMLFILE); ## The magic part: while($html =~ m#<(/?)([^ >]*)[^>]*>#gi) { if ($1) { $tag{lc $2}--; } else { $tag{lc $2}++; } } ## Now we have lots of options to play with: ## Show ALL tags, matched then unmatched: print "Matched tags:\n"; for $x (sort keys %tag) { print "$x\n" unless $tag{$x}; } print "Unmatched tags:\n"; for $x (sort keys %tag) { print "$x\n" if $tag{$x}; } ## Go through our list of 'closed' and check each: for $x (@closedtags) { printf "Results for html tag %5s: ", $x; if (defined $tag{$x}) { print $tag{$x} ? "NOT balanced\n" : "balanced\n"; } else { print "None found.\n"; } } # Etc. The regexp does a fairly good job, but # misses weird cases like > embedded in quotes, etc.