Re: parsing an ASP file

I think (but have not tested) that even an inefficient regex is faster than reading one character at a time. It is certainly easier to write :)

my @parsed;

while ($asp =~ /\G ((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) )?
+ /gsx) {
    $1 and push @parsed, [ html => $1 ];
    $2 and push @parsed, [ asp  => $2 ];
    defined $3 and die "Unclosed ASP code block near '",
        $asp =~ /\G(<%\s*\n?.*)/g, "'.\n";
}
[download]

But, of course,

<% foo = "a mere %> breaks either simple minded solution." %>
[download]

Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Comment on Re: parsing an ASP file Select or Download Code

Replies are listed 'Best First'.

Re: Re: parsing an ASP file
by dada (Chaplain) on May 20, 2004 at 10:12 UTC

need

sub get_asp_blocks {
    my($file) = @_;
    open(FILE, $file) or die "can't open '$file': $!\n";

    my $dot = 1;
    my @blocks = ( ["HTM", $dot, ""] );
    my $state = "HTM";
    my $last;
    while(read(FILE, $char, 1)) {
        $dot++ if $char eq "\n";
        if($last eq "<" && $char eq "%" && $state eq "HTM") {
            chop $blocks[-1][-1];
            $state = "ASP";
            push(@blocks, ["ASP", $dot, ""]);
        } elsif($last eq "%" && $char eq ">" && $state eq "ASP") {
            chop $blocks[-1][-1];
            $state = "HTM";
            push(@blocks,  ["HTM", $dot, ""]);
        } else {
            $blocks[-1][-1] .= $char;
        }
        $last = $char;
    }
    close(FILE);
    return @blocks;
}
[download]

cheers,
Aldo

King of Laziness, Wizard of Impatience, Lord of Hubris

[reply]
[d/l]

Re: Re: Re: parsing an ASP file

by Juerd (Abbot) on May 23, 2004 at 22:57 UTC

my $state = "HTM";

The state is what I don't like. It means that everything needs to be done manually. So to get the line numbers, I'd probably just extend the regex with one set of all-enclosing parens (or for simple stand-alone scripts just use $&), and then count the number of \n characters found in it.

my @parsed;
my $line = 1;

while ($asp =~ /\G( ((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) )
+? )/gsx) {
    $2 and push @parsed, [ $line, html => $2 ];
    $3 and push @parsed, [ $line, asp  => $3 ];
    defined $4 and die "Unclosed ASP code block starting on line $line
+ near '",
        $asp =~ /\G(<%\s*\n?.*)/g, "'.\n";
    $line += $1 =~ tr/\n//;
}
[download]

Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

[reply]
[d/l]

Re: Re: Re: Re: parsing an ASP file

by dada (Chaplain) on May 25, 2004 at 09:21 UTC

mine yours

HTM 1 HTM 1

ASP 31 ASP 1

HTM 31 HTM 31

ASP 44 ASP 31

HTM 46 HTM 46

ASP 50 ASP 46

HTM 50 HTM 50

ASP 55 ASP 50

HTM 59 HTM 59

ASP 73 ASP 59

HTM 75 HTM 75

that is, it counts correctly for HTM blocks, but doesn't increment the line number for ASP blocks. I tried moving the line $line += ... before the push, but it didn't help.

cheers,
Aldo

King of Laziness, Wizard of Impatience, Lord of Hubris

[reply]

Re: Re: Re: Re: Re: parsing an ASP file

by Juerd (Abbot) on May 25, 2004 at 14:47 UTC

Re: Re: Re: Re: Re: Re: parsing an ASP file

by Juerd (Abbot) on May 25, 2004 at 21:34 UTC

Re: Re: parsing an ASP file
by jryan (Vicar) on May 13, 2004 at 08:16 UTC

Ah, but a more complete version is easy to write too! :) (Although, I admit, a bit more longwinded...)


use re 'eval';

my $string = qr[
      " [^"\\]* (?:\\.|[^"\\])* "
    | ' [^'\\]* (?:\\.|[^'\\])* '
]x;

my $alist = qr[(?: [^"'>]* | $string )*]x;
my $ehead = qr[ <\w+ $alist /? > ]x;

my $textarea = qr[
    <textarea $alist>
    (?:
          [^<]*
        | < (?!/textarea>)
    )*
    </textarea>
]x;

my $asp = qr[
    <%
    (?:
          (?> [^%"']* )
        | $string
        | % (?! > )
    )+
    %>
]x;

my $html = qr[
    (?:
          (?> [^<"'] )
        | $textarea
        | $ehead
        | </\w+>
    )+
]x;

my @parsed;
() = $string =~ /
          ($asp)  (?{ push @parsed, [asp  => $1] })
        | ($html) (?{ push @parsed, [html => $2] })
/gx;
[download]

[reply]
[d/l]