in reply to parsing an ASP file

I think (but have not tested) that even an inefficient regex is faster than reading one character at a time. It is certainly easier to write :)

my @parsed; while ($asp =~ /\G ((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) )? + /gsx) { $1 and push @parsed, [ html => $1 ]; $2 and push @parsed, [ asp => $2 ]; defined $3 and die "Unclosed ASP code block near '", $asp =~ /\G(<%\s*\n?.*)/g, "'.\n"; }
But, of course,
<% foo = "a mere %> breaks either simple minded solution." %>

Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Replies are listed 'Best First'.
Re: Re: parsing an ASP file
by dada (Chaplain) on May 20, 2004 at 10:12 UTC
    yep. one thing I forgot to mention is that, for the application I'm currently writing (which is basically an ASP cross-reference generator) I need to have the line number where each block appears. so, the code I'm using is something more like:
    sub get_asp_blocks { my($file) = @_; open(FILE, $file) or die "can't open '$file': $!\n"; my $dot = 1; my @blocks = ( ["HTM", $dot, ""] ); my $state = "HTM"; my $last; while(read(FILE, $char, 1)) { $dot++ if $char eq "\n"; if($last eq "<" && $char eq "%" && $state eq "HTM") { chop $blocks[-1][-1]; $state = "ASP"; push(@blocks, ["ASP", $dot, ""]); } elsif($last eq "%" && $char eq ">" && $state eq "ASP") { chop $blocks[-1][-1]; $state = "HTM"; push(@blocks, ["HTM", $dot, ""]); } else { $blocks[-1][-1] .= $char; } $last = $char; } close(FILE); return @blocks; }
    this way, each element of the returned array contains three elements: the type (ASP or HTM), the line number, and the block itself.

    cheers,
    Aldo

    King of Laziness, Wizard of Impatience, Lord of Hubris

      my $state = "HTM";

      The state is what I don't like. It means that everything needs to be done manually. So to get the line numbers, I'd probably just extend the regex with one set of all-enclosing parens (or for simple stand-alone scripts just use $&), and then count the number of \n characters found in it.

      my @parsed; my $line = 1; while ($asp =~ /\G( ((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) ) +? )/gsx) { $2 and push @parsed, [ $line, html => $2 ]; $3 and push @parsed, [ $line, asp => $3 ]; defined $4 and die "Unclosed ASP code block starting on line $line + near '", $asp =~ /\G(<%\s*\n?.*)/g, "'.\n"; $line += $1 =~ tr/\n//; }

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        well, your code looks surely good, but seems to be failing line count. on a simple ASP page of mine I get these results:

        mine yours
        HTM 1 HTM 1
        ASP 31 ASP 1
        HTM 31 HTM 31
        ASP 44 ASP 31
        HTM 46 HTM 46
        ASP 50 ASP 46
        HTM 50 HTM 50
        ASP 55 ASP 50
        HTM 59 HTM 59
        ASP 73 ASP 59
        HTM 75 HTM 75

        that is, it counts correctly for HTM blocks, but doesn't increment the line number for ASP blocks. I tried moving the line $line += ... before the push, but it didn't help.

        cheers,
        Aldo

        King of Laziness, Wizard of Impatience, Lord of Hubris

Re: Re: parsing an ASP file
by jryan (Vicar) on May 13, 2004 at 08:16 UTC

    Ah, but a more complete version is easy to write too! :) (Although, I admit, a bit more longwinded...)

    use re 'eval'; my $string = qr[ " [^"\\]* (?:\\.|[^"\\])* " | ' [^'\\]* (?:\\.|[^'\\])* ' ]x; my $alist = qr[(?: [^"'>]* | $string )*]x; my $ehead = qr[ <\w+ $alist /? > ]x; my $textarea = qr[ <textarea $alist> (?: [^<]* | < (?!/textarea>) )* </textarea> ]x; my $asp = qr[ <% (?: (?> [^%"']* ) | $string | % (?! > ) )+ %> ]x; my $html = qr[ (?: (?> [^<"'] ) | $textarea | $ehead | </\w+> )+ ]x; my @parsed; () = $string =~ / ($asp) (?{ push @parsed, [asp => $1] }) | ($html) (?{ push @parsed, [html => $2] }) /gx;