in reply to Re{4): Template Parsing - Finding tag pairs.
in thread Template Parsing - Finding tag pairs.

I guess sharing a stack would help...
# this code is missing a lot. don't expect it to work :) { my @stack; sub start { push @stack, $tag; } sub end { if ($tag eq 'cfif' and $stack[-1] eq 'cfelse'){ pop @stack; } die "Invalid code" if pop(@stack) ne $tag; } sub text { # use @stack to determine where we are... } } my $parser = HTML::Parser->new(start_h => [\&start, 'tagname'], end_h => [\&end, 'tagname'], text => [\&text, 'text'], ); $parser->report_tags(qw/cfif cfelse cfend/); $parser->parse($cfml);
---
<body><cfif>foo<cfelse><b>bar</b></cfif></body> ==> text '<body>'; start 'cfif'; text 'foo'; start 'cfelse'; text '<b>bar</b>'; end 'cfif';

2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Replies are listed 'Best First'.
Re{6): Template Parsing - Finding tag pairs.
by IlyaM (Parson) on Dec 26, 2001 at 00:39 UTC
    The problem with embded languages (like PHP, CF, etc) which use tag-like syntax is that:
    • They can be used with non-HTML documents which can confuse "normal" HTML parsers
    • Even for HTML documents structure of HTML document doesn't necessary matches structure of embded language pseudo-tags tree. That is pseudo-tag can be inside HTML tag, it can cross boundaries of HTML tags.
    • HTML tags can be generated by pseudo-tags. In this case input document can often look seriously broken to "normal" HTML parser.
    Proper parser for embeded language should ignore all HTML markup (or any other markup, or any text which looks like markup). It should take in account only its pseudo-tags. Is it possible to make HTML::Parser ignore everything except pseudo-tags? I don't think so but I can be wrong.

    --
    Ilya Martynov (http://martynov.org/)

      $whatever = 'CFML';
      Having normal HTML in $whatever or $whatever in normal HTML is not a problem with HTML::Parser, if you use the report_tags() method. That'll have the parser ignore unknown tags, leaving non-$whatever tags for what they are.

      So the three points you mention are irrelevant.
      Yes, it IS possible to make HTML::Parser ignore everything except pseudo-tags. That's what I've been talking about all the time - the report_tags() method. *sigh* :)

      2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

        HTML::Parser doesn't want to parse broken HTML. Like in this example (pseudo tag inside real tag).
        use strict; use warnings; my $data = <<DATA; <p <pseudotag>> DATA use HTML::Parser; my $p = new HTML::Parser(); $p->handler('start', \&start_sub, 'text'); $p->report_tags('pseudotag'); $p->parse($data); $p->eof; sub start_sub { my $text = shift; print "$text\n"; }

        --
        Ilya Martynov (http://martynov.org/)