comment on

Contrary to jpeg's suggestion, I'd suggest that you use a stack for keeping track of tags. Each time you see an open-tag, push it on the stack; each time you see a close tag, check the last thing on the stack and see if it's the same sort of tag.

If it's a match, you're fine, and you can pop that last element off the stack and move on. If it's not a match, the next issue is: for this unmatched end-tag, check further along the stack to see if you do find a matching open tag; if so, then all open tags from that point to the end are probably lacking their end-tags. If the current end tag has no match at all in the stack, then you know you're missing an open tag for it.

And to contradict jpeg yet again, here's an example of the technique:

#!/usr/bin/perl

use strict;

$/=undef;
$_=<>; 

my @stack = ();
my $offset = 0;

while (( my $i = index( $_, "<" )) >= 0 ) {
    $offset += $i;
    $_ = substr( $_, $i );
    if ( s{^(<(\w+).*?>)}{} ) {
        $offset += length( $1 );
        push @stack, $2;
    }
    elsif( s{^(</(\w+)>)}{} ) {
        my $et = $2;
        if ( $stack[$#stack] eq $et ) {
            pop @stack;
        }
        elsif ( grep( /$et/, @stack )) {
            print "missing end-tags for:";
            while ( @stack and $stack[$#stack] ne $et ) {
                print " ".pop @stack;
            }
            print " at </$et> (offset: $offset)\n";
        }
        else {
            print "missing open-tag for $et (offset: $offset)\n";
        }
        $offset += length( $1 );
    }
}

### updated: added condition on inner while loop to check for empty st
+ack
[download]

Now, the results printed by that approach can be inaccurate or misleading under certain circumstances, but you will at least get a reasonable look at where the problems start.

And of course, if you have data with lots of elaborate stuff in the tags (e.g. a close-angle-bracket inside a quoted string that is part of an attribute value in an open tag), then this approach will be thrown off totally, and you'll need to parse the input more carefully. Good luck with that.

(One more update: it's possible that there might be open-angle brackets in the text, which are not intended as the beginning of a tag -- this isn't supposed to happen, the text is supposed to use "<" instead of a bare "<", but hey, it happens, and it will also cause this script to fail, or at least create a lot of false-alarm error reports. Perhaps that's just as well...)

In reply to Re: Pair Tag missing by graff
in thread Pair Tag missing by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.