vladb has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks,

I'm currently working on a ColdFusion-like template parser (refer to this node in Meditations for more detail) and am nearly done with the main parser routine. As I'm looking through my code (and debugging ;-), I thought to seek your wisdom on this piece:

This code snippen is supposed to find outer-most closing tag for given opening tag. Refer to inline perl comments for extra details.
# have to search a little differently for nest +ed tags # making sure that an ending tag belonging to # a nested opening tag is not processed as the + ending # tag for the current opening tag. # In case i sounded awkward, here's a little d +iagram: # # <cfif> <--- this tag # <cfif> <--- nested open tag # </cfif> <--- end tag for the nested open +tag # </cfif> <--- end tag for this tag (the on +e that has # to be picked up) # # Actual example: # @chunks = # . . . . . . # 5 "<cfif bool eq 1>\cJ\cI " <--- +chk_i, found_i # 6 "<cfif foo = bar>\cJ\cI\cI " # 7 "<cfif bar = foo>\cJ\cI\cI " # 8 "</cfif>\cJ\cI " # 9 "</cfif>\cJ\cI\cI \cJ\cI BOOL is true!\c +J" # 10 "<cfelse>\cJ\cIBOOL is false!\cJ" # 11 '</cfif> <--- +after: found_i my $opening_tag = $rules->{tag_start}[0] . $ta +g_name; my $nested = 0; # count of nested open tags fo +und. while ((($chunks[++$found_i] =~ m/^$closing_ta +g/) ? ($nested > 0 ? $nested-- : 0) : ($chunks[$found_i] =~ m/^$opening_ta +g/ ? ++$nested : 1)) && $found_i < @chunks) {}
My question is do you see any problem with the code? I'm sure some of you came across similar problems and might have some knowledge of possible pitfalls/hidden bugs. Please feel free to add/subtract from this code.. all suggestions are much appreciated ;-).

Cheers,

"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

Replies are listed 'Best First'.
(crazyinsomniac) Re: Template Parsing - Finding tag pairs.
by crazyinsomniac (Prior) on Dec 25, 2001 at 11:07 UTC
    My question is do you see any problem with the code?

    Yes. For one, half of it is commented out, and there's waaay too much whitespace. But seriously, maintainability is key (and that is not hidden). I took a little time to get this running, and hopefully i'll be the only one (you should really provide a runnable code example in the future, easier to spot pitfalls ;D). The only thing i'd do different is take this out the while loop (and use a few if and elses here and there).

    #!/usr/bin/perl -wl use strict; # have to search a little differently for nested tags # making sure that an ending tag belonging to # a nested opening tag is not processed as the ending # tag for the current opening tag. # In case i sounded awkward, here's a little diagram: # # <cfif> <--- this tag # <cfif> <--- nested open tag # </cfif> <--- end tag for the nested open tag # </cfif> <--- end tag for this tag (the one that has # to be picked up) # # Actual example: my @chunks = ("<cfif bool eq 1>\cJ\cI ", "<cfif foo = bar>\cJ\cI\cI ", "<cfif bar = foo>\cJ\cI\cI ", "</cfif>\cJ\cI ", "</cfif>\cJ\cI\cI \cJ\cI BOOL is true!\cJ", "<cfelse>\cJ\cIBOOL is false!\cJ", '</cfif>'); my $opening_tag = qr/\<cfif/; my $closing_tag = qr/\<\/cfif/; my $found_i = 0; my $nested = 0; # count of nested open tags found. while ( ( ($chunks[++$found_i] =~ m/^$closing_tag/) ? ( ($nested > 0) ? ($nested--) : (0) ) : ( ($chunks[$found_i] =~ m/^$opening_tag/) ? (++$nested) : (1) ) ) && ( $found_i < @chunks ) ) { print "F: $found_i ", "N: $nested ", "C: $chunks[$found_i]", ; } __END__ F:\dev\vladb>perl nestag.pl F: 1 N: 1 C: <cfif foo = bar> F: 2 N: 2 C: <cfif bar = foo> F: 3 N: 1 C: </cfif> F: 4 N: 0 C: </cfif> BOOL is true! F: 5 N: 0 C: <cfelse> BOOL is false!
    update I suspect you'll be building some kind of data structure, and $nested seems like a prime index ;

     
    ___crazyinsomniac_______________________________________
    Disclaimer: Don't blame. It came from inside the void

    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

      Thanks for your comments. They are very much to the point and will be well taken ;-). I should definitely avoid putting snippets of code that doens't run on it's own hehe. Promise to improve on that the next time.

      Relating to the while loop.. It happened so that I had started with a rather simple one-liner while loop for non-nested tags which worked pretty well. And it happend so that when I thought of adding the nested capability, I simply took that original and added a few boolean clauses to take into account any nested tags (and skip them).

      Here's the code as I have it now (slightly modified to run as a stand-alone):
      #!/usr/local/bin/perl -w use strict; my $is_nested = 0; my @chunks = ("<cfif bool eq 1>\cJ\cI ", "<cfif foo = bar>\cJ\cI\cI ", "<cfif bar = foo>\cJ\cI\cI ", "</cfif>\cJ\cI ", "</cfif>\cJ\cI\cI \cJ\cI BOOL is true!\cJ", "<cfelse>\cJ\cIBOOL is false!\cJ", '</cfif>'); my $opening_tag = qr/\<cfif/; my $closing_tag = qr/\<\/cfif/; my $found_i = 0; unless ($is_nested) { # search for the closing pair (starting at the place the # first tag was found + 1) # note: this search is good for non-nested tags... # FIRST WHILE while (($chunks[++$found_i] !~ m/^$closing_tag/) && $found_i < @ch +unks) {} } else { # have to search a little differently for nested tags # making sure that an ending tag belonging to # a nested opening tag is not processed as the ending # tag for the current opening tag. # In case i sounded awkward, here's a little diagram: # # <cfif> <--- this tag # <cfif> <--- nested open tag # </cfif> <--- end tag for the nested open tag # </cfif> <--- end tag for this tag (the one that has # to be picked up) # my $nested = 0; # count of nested open tags found. # SECOND WHILE while ((($chunks[++$found_i] =~ m/^$closing_tag/) ? ($nested > 0 ? $nested-- : 0) : ($chunks[$found_i] =~ m/^$opening_tag/ ? ++$nested : +1)) && $found_i < @chunks) { print "F: $found_i ", "N: $nested ", "C: $chunks[$found_i]", ; } }
      I, basically, expended this clause
      ($chunks[++$found_i] !~ m/^$closing_tag/)
      found in the first while loop a little bit by adding this
      ? ($nested > 0 ? $nested-- : 0) : ($chunks[$found_i] =~ m/^$opening_tag/ ? ++$nested : 1).

      Certainly, I should agree that in terms of maintainability this may not be a perfect solution. And, therefore, I might have to move that 'logic' inside the while body using a few conditionals (ifs/elses).


      "There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith
Re: Template Parsing - Finding tag pairs.
by Juerd (Abbot) on Dec 25, 2001 at 23:20 UTC
    /me fails to see why you're trying to re-invent the wheel.

    CF looks a lot like html tags to me. Of course, HTML::Parser is great for parsing html documents, even if it's pseudo-html. Of one of the many modules using HTML::Parser...
    If for some reason that's not useable, you could try Text::Balanced, a module that can extract something delimited by balanced html-alike tags, and a lot of other parsing.
    And there are many, many other prefab parsing modules. What's your motivation for not using any of those?

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

      I suppose that HTML::Parser is unusable for CF. The trick is that parser must notice only CF tags. Text document with CF tags can easily look as invalid html document for HTML::Parser in many cases.

      --
      Ilya Martynov (http://martynov.org/)

        I'm not familiar with CFML. There might be some tags that would make invalid syntax, but if there are not, $htmlparser->report_tags(qw/cfif cfelseif cfelse cfoutput cfinclude cfetcetera/) might provide for easy parsing using event subs.

        2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$