in reply to Re: Re: Template Parsing - Finding tag pairs.
in thread Template Parsing - Finding tag pairs.

I'm not familiar with CFML. There might be some tags that would make invalid syntax, but if there are not, $htmlparser->report_tags(qw/cfif cfelseif cfelse cfoutput cfinclude cfetcetera/) might provide for easy parsing using event subs.

2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Replies are listed 'Best First'.
Re{4): Template Parsing - Finding tag pairs.
by IlyaM (Parson) on Dec 26, 2001 at 00:17 UTC
    Simple example. How it will handle CF tags inside HTML tags? Or CF tags in text non-HTML documents (which can contains anything)?

    --
    Ilya Martynov (http://martynov.org/)

      I guess sharing a stack would help...
      # this code is missing a lot. don't expect it to work :) { my @stack; sub start { push @stack, $tag; } sub end { if ($tag eq 'cfif' and $stack[-1] eq 'cfelse'){ pop @stack; } die "Invalid code" if pop(@stack) ne $tag; } sub text { # use @stack to determine where we are... } } my $parser = HTML::Parser->new(start_h => [\&start, 'tagname'], end_h => [\&end, 'tagname'], text => [\&text, 'text'], ); $parser->report_tags(qw/cfif cfelse cfend/); $parser->parse($cfml);
      ---
      <body><cfif>foo<cfelse><b>bar</b></cfif></body> ==> text '<body>'; start 'cfif'; text 'foo'; start 'cfelse'; text '<b>bar</b>'; end 'cfif';

      2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

        The problem with embded languages (like PHP, CF, etc) which use tag-like syntax is that:
        • They can be used with non-HTML documents which can confuse "normal" HTML parsers
        • Even for HTML documents structure of HTML document doesn't necessary matches structure of embded language pseudo-tags tree. That is pseudo-tag can be inside HTML tag, it can cross boundaries of HTML tags.
        • HTML tags can be generated by pseudo-tags. In this case input document can often look seriously broken to "normal" HTML parser.
        Proper parser for embeded language should ignore all HTML markup (or any other markup, or any text which looks like markup). It should take in account only its pseudo-tags. Is it possible to make HTML::Parser ignore everything except pseudo-tags? I don't think so but I can be wrong.

        --
        Ilya Martynov (http://martynov.org/)

      Precisely to the point IlyaM ;-). It all boils down to parsing a CFML code like this:
      <cfset var = "foobar"> <cfset bool = "true"> <cfif bool eq "true"> print something here if bool is true. <cfelse> <cfif var eq "foobar"> foobar... Will dump query: <cfquery name="foobar_query"> #foo#, #bar# </cfquery> <cfoutput>foobar = #foobar#</cfoutput> </cfif> </cfif>
      This has nothing to do with HTML and therefore, using HTML::Parser(s) is either an overkill or plain useless in my case. I've posted a node in meditations explaining my design behind the module (ColdFusion parser). Read more here.

      "There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith
        with
        @stack = ( ['cfif', 'bool eq "true"'], ['cfelse'] );
        the subs could start with something like
        sub start { return if $stack[-1][0] eq 'cfif' and $tag ne 'cfelse' and not istrue +($stack[-1][1]); return if $stack[-1][0] eq 'cfelse' and istrue($stack[-2][1]); } sub end { return if $stack[-1][0] eq 'cfif' and $tag ne 'cfif' and not istrue(. +..); return if $stack[-1][0] eq 'cfelse' and istrue(...); } sub text { something like what i used in start() }
        This piece of pseudo-code doesn't handle nested cfif's, but nothing is impossible. Maybe the bool eq "true" syntax is too hard for HTML::Parser, but nesting and ignoring will of course be up to you, just like when you use your own parser.

        2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$