s0n1c has asked for the wisdom of the Perl Monks concerning the following question:

if i was looking for all occurences of unclosed anchor tags. is there a pattern i could use ? i was thinking something along the lines of <a(.|\n)*?(?!</a)*?(.|\n)*?<a but its obviously not working this is attempting to locate 2 open anchor tags w/out closing one in the middle.

Replies are listed 'Best First'.
Re: how do i find unclosed tags ?
by Joost (Canon) on May 21, 2002 at 13:25 UTC
    My approach would be:

    1. find all <a ... </a matches and extract them
    2. search them for <a strings

    like this:

    #!/usr/bin/perl -w use strict; $/=undef; $_ = <>; my @ina = /<a(.*?)<\/a/isg; print "Found unterminated <A> tag containing: '$_'" for grep { /<a/ } +@ina;

    This is not a perfect solution: it does not take into account things like:

    <a href="ok" name="<a> go on">this is ok</a>
    but since you're dealing with imperfect input, it should be enough to give you a hint. :-)
Re: how do i find unclosed tags ?
by amarceluk (Beadle) on May 21, 2002 at 13:17 UTC
    I find the easiest way to do this is to mark the closing tags with a unique character - I usually use an asterisk, but it can be any other character that doesn't exist in the rest of the file.
    $lines =~ s/(<\/a>)/\*$1/gms;
    Then I search for two open tags without that character between them.
    $lines =~ /<a[^\*]*<a/gms;