Re: Re{4): Template Parsing - Finding tag pairs.

I guess sharing a stack would help...

# this code is missing a lot. don't expect it to work :)
{
 my @stack;
 sub start {
  push @stack, $tag;
 }
 sub end {
  if ($tag eq 'cfif' and $stack[-1] eq 'cfelse'){
   pop @stack;
  }
  die "Invalid code" if pop(@stack) ne $tag;
 }
 sub text {
  # use @stack to determine where we are...
 }
}
my $parser = HTML::Parser->new(start_h => [\&start, 'tagname'],
                               end_h   => [\&end,   'tagname'],
                               text    => [\&text,  'text'],
);
$parser->report_tags(qw/cfif cfelse cfend/);
$parser->parse($cfml);
[download]

---

<body><cfif>foo<cfelse><b>bar</b></cfif></body>
==>  text  '<body>';
     start 'cfif';
     text  'foo';
     start 'cfelse';
     text  '<b>bar</b>';
     end   'cfif';
[download]

2;0 juerd@ouranos:~$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~$
[download]

Comment on Re: Re{4): Template Parsing - Finding tag pairs. Select or Download Code

Replies are listed 'Best First'.
Re{6): Template Parsing - Finding tag pairs. by IlyaM (Parson) on Dec 26, 2001 at 00:39 UTC
The problem with embded languages (like PHP, CF, etc) which use tag-like syntax is that: They can be used with non-HTML documents which can confuse "normal" HTML parsers Even for HTML documents structure of HTML document doesn't necessary matches structure of embded language pseudo-tags tree. That is pseudo-tag can be inside HTML tag, it can cross boundaries of HTML tags. HTML tags can be generated by pseudo-tags. In this case input document can often look seriously broken to "normal" HTML parser. Proper parser for embeded language should ignore all HTML markup (or any other markup, or any text which looks like markup). It should take in account only its pseudo-tags. Is it possible to make HTML::Parser ignore everything except pseudo-tags? I don't think so but I can be wrong. -- Ilya Martynov (http://martynov.org/)	[reply]
Re: Re{6): Template Parsing - Finding tag pairs. by Juerd (Abbot) on Dec 26, 2001 at 00:45 UTC
`$whatever = 'CFML';` Having normal HTML in $whatever or $whatever in normal HTML is not a problem with HTML::Parser, if you use the `report_tags()` method. That'll have the parser ignore unknown tags, leaving non-$whatever tags for what they are. So the three points you mention are irrelevant. Yes, it IS possible to make HTML::Parser ignore everything except pseudo-tags. That's what I've been talking about all the time - the `report_tags()` method. sigh :) `2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$` [download]	[reply] [d/l]
Re{8): Template Parsing - Finding tag pairs. by IlyaM (Parson) on Dec 26, 2001 at 02:30 UTC
HTML::Parser doesn't want to parse broken HTML. Like in this example (pseudo tag inside real tag). `use strict; use warnings; my $data = <<DATA; <p <pseudotag>> DATA use HTML::Parser; my $p = new HTML::Parser(); $p->handler('start', \&start_sub, 'text'); $p->report_tags('pseudotag'); $p->parse($data); $p->eof; sub start_sub { my $text = shift; print "$text\n"; }` [download] -- Ilya Martynov (http://martynov.org/)	[reply] [d/l]