in reply to how do i find unclosed tags ?

My approach would be:

1. find all <a ... </a matches and extract them
2. search them for <a strings

like this:

#!/usr/bin/perl -w use strict; $/=undef; $_ = <>; my @ina = /<a(.*?)<\/a/isg; print "Found unterminated <A> tag containing: '$_'" for grep { /<a/ } +@ina;

This is not a perfect solution: it does not take into account things like:

<a href="ok" name="<a> go on">this is ok</a>
but since you're dealing with imperfect input, it should be enough to give you a hint. :-)