Template Parsing - Finding tag pairs.

vladb has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks,

I'm currently working on a ColdFusion-like template parser (refer to this node in Meditations for more detail) and am nearly done with the main parser routine. As I'm looking through my code (and debugging ;-), I thought to seek your wisdom on this piece:

This code snippen is supposed to find outer-most closing tag for given opening tag. Refer to inline perl comments for extra details.

                        # have to search a little differently for nest
+ed tags
                        # making sure that an ending tag belonging to
                        # a nested opening tag is not processed as the
+ ending
                        # tag for the current opening tag.
                        # In case i sounded awkward, here's a little d
+iagram:
                        #
                        # <cfif>     <--- this tag
                        #    <cfif>  <--- nested open tag
                        #    </cfif> <--- end tag for the nested open 
+tag
                        # </cfif>    <--- end tag for this tag (the on
+e that has
                        #                 to be picked up)
                        #

                        # Actual example:
                        # @chunks =
                        #   . . . . . .
                        # 5  "<cfif bool eq 1>\cJ\cI "           <--- 
+chk_i, found_i
                        # 6  "<cfif foo = bar>\cJ\cI\cI  "
                        # 7  "<cfif bar = foo>\cJ\cI\cI  "
                        # 8  "</cfif>\cJ\cI "
                        # 9  "</cfif>\cJ\cI\cI  \cJ\cI BOOL is true!\c
+J"
                        # 10  "<cfelse>\cJ\cIBOOL is false!\cJ"
                        # 11  '</cfif>                           <--- 
+after: found_i
                        my $opening_tag = $rules->{tag_start}[0] . $ta
+g_name;
                        my $nested = 0; # count of nested open tags fo
+und.


                        while ((($chunks[++$found_i] =~ m/^$closing_ta
+g/)
                                ? ($nested > 0 ? $nested-- : 0)
                                : ($chunks[$found_i] =~ m/^$opening_ta
+g/   ? ++$nested : 1))
                               && $found_i < @chunks) {}
[download]

My question is do you see any problem with the code? I'm sure some of you came across similar problems and might have some knowledge of possible pitfalls/hidden bugs. Please feel free to add/subtract from this code.. all suggestions are much appreciated ;-).

Cheers,

"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

Comment on Template Parsing - Finding tag pairs. Download Code

Replies are listed 'Best First'.

(crazyinsomniac) Re: Template Parsing - Finding tag pairs.
by crazyinsomniac (Prior) on Dec 25, 2001 at 11:07 UTC

My question is do you see any problem with the code?

Yes. For one, half of it is commented out, and there's waaay too much whitespace. But seriously, maintainability is key (and that is not hidden). I took a little time to get this running, and hopefully i'll be the only one (you should really provide a runnable code example in the future, easier to spot pitfalls ;D). The only thing i'd do different is take this out the while loop (and use a few if and elses here and there).

#!/usr/bin/perl -wl
use strict;
# have to search a little differently for nested tags
# making sure that an ending tag belonging to
# a nested opening tag is not processed as the ending
# tag for the current opening tag.
# In case i sounded awkward, here's a little diagram:
#
# <cfif>     <--- this tag
#    <cfif>  <--- nested open tag
#    </cfif> <--- end tag for the nested open tag
# </cfif>    <--- end tag for this tag (the one that has
#                 to be picked up)
#

# Actual example:

my @chunks = ("<cfif bool eq 1>\cJ\cI ",
"<cfif foo = bar>\cJ\cI\cI  ",
"<cfif bar = foo>\cJ\cI\cI  ",
"</cfif>\cJ\cI ",
"</cfif>\cJ\cI\cI  \cJ\cI BOOL is true!\cJ",
"<cfelse>\cJ\cIBOOL is false!\cJ",
'</cfif>');

my $opening_tag = qr/\<cfif/;
my $closing_tag = qr/\<\/cfif/;
my $found_i = 0;
my $nested = 0; # count of nested open tags found.

while
(   (  ($chunks[++$found_i] =~ m/^$closing_tag/)
       ? (
           ($nested > 0)
           ? ($nested--)
           : (0)
         )
       : (
           ($chunks[$found_i] =~ m/^$opening_tag/)
           ? (++$nested)
           : (1)
         )
    )
    &&
    ( $found_i < @chunks )
)
{
    print "F: $found_i ",
          "N: $nested ",
          "C: $chunks[$found_i]",
          ;
}
__END__
F:\dev\vladb>perl nestag.pl
F: 1 N: 1 C: <cfif foo = bar>

F: 2 N: 2 C: <cfif bar = foo>

F: 3 N: 1 C: </cfif>

F: 4 N: 0 C: </cfif>

         BOOL is true!

F: 5 N: 0 C: <cfelse>
        BOOL is false!
[download]

update

___crazyinsomniac_______________________________________
Disclaimer: Don't blame. It came from inside the void
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

[reply]
[d/l]

Re: (crazyinsomniac) Re: Template Parsing - Finding tag pairs.

by vladb (Vicar) on Dec 25, 2001 at 11:52 UTC


#!/usr/local/bin/perl -w
use strict;
my $is_nested = 0;

my @chunks = ("<cfif bool eq 1>\cJ\cI ",
"<cfif foo = bar>\cJ\cI\cI  ",
"<cfif bar = foo>\cJ\cI\cI  ",
"</cfif>\cJ\cI ",
"</cfif>\cJ\cI\cI  \cJ\cI BOOL is true!\cJ",
"<cfelse>\cJ\cIBOOL is false!\cJ",
'</cfif>');

my $opening_tag = qr/\<cfif/;
my $closing_tag = qr/\<\/cfif/;
my $found_i = 0;

unless ($is_nested) {
    # search for the closing pair (starting at the place the
    # first tag was found + 1)
    # note: this search is good for non-nested tags...
    
    # FIRST WHILE
    while (($chunks[++$found_i] !~ m/^$closing_tag/) && $found_i < @ch
+unks) {}  

} else {
    # have to search a little differently for nested tags
     # making sure that an ending tag belonging to
    # a nested opening tag is not processed as the ending
    # tag for the current opening tag.
    # In case i sounded awkward, here's a little diagram:
    #
    # <cfif>     <--- this tag
    #    <cfif>  <--- nested open tag
    #    </cfif> <--- end tag for the nested open tag
    # </cfif>    <--- end tag for this tag (the one that has
    #                 to be picked up)
    #    

    my $nested = 0; # count of nested open tags found.

    # SECOND WHILE
    while ((($chunks[++$found_i] =~ m/^$closing_tag/)
         ? ($nested > 0 ? $nested-- : 0)
             : ($chunks[$found_i] =~ m/^$opening_tag/   ? ++$nested : 
+1))
             && $found_i < @chunks) {

        print "F: $found_i ",
              "N: $nested ",
              "C: $chunks[$found_i]",
              ;

    }

    }
[download]

($chunks[++$found_i] !~ m/^$closing_tag/)

? ($nested > 0 ? $nested-- : 0) : ($chunks[$found_i] =~ m/^$opening_tag/ ? ++$nested : 1)

"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

[reply]
[d/l]
[select]

Re: Template Parsing - Finding tag pairs.
by Juerd (Abbot) on Dec 25, 2001 at 23:20 UTC

HTML::Parser

Text::Balanced

2;0 juerd@ouranos:~$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~$
[download]

[reply]
[d/l]

Re: Re: Template Parsing - Finding tag pairs.

by IlyaM (Parson) on Dec 25, 2001 at 23:59 UTC

HTML::Parser

--
Ilya Martynov (http://martynov.org/)

[reply]

Re: Re: Re: Template Parsing - Finding tag pairs.

by Juerd (Abbot) on Dec 26, 2001 at 00:14 UTC

$htmlparser->report_tags(qw/cfif cfelseif cfelse cfoutput cfinclude cfetcetera/)

2;0 juerd@ouranos:~$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~$
[download]

[reply]
[d/l]

Re{4): Template Parsing - Finding tag pairs.

by IlyaM (Parson) on Dec 26, 2001 at 00:17 UTC

Re: Re{4): Template Parsing - Finding tag pairs.

by Juerd (Abbot) on Dec 26, 2001 at 00:26 UTC

Some notes below your chosen depth have not been shown here

Re: Re{4): Template Parsing - Finding tag pairs.

by vladb (Vicar) on Dec 26, 2001 at 00:47 UTC

Some notes below your chosen depth have not been shown here