in reply to Re: parsing an ASP file
in thread parsing an ASP file

yep. one thing I forgot to mention is that, for the application I'm currently writing (which is basically an ASP cross-reference generator) I need to have the line number where each block appears. so, the code I'm using is something more like:
sub get_asp_blocks { my($file) = @_; open(FILE, $file) or die "can't open '$file': $!\n"; my $dot = 1; my @blocks = ( ["HTM", $dot, ""] ); my $state = "HTM"; my $last; while(read(FILE, $char, 1)) { $dot++ if $char eq "\n"; if($last eq "<" && $char eq "%" && $state eq "HTM") { chop $blocks[-1][-1]; $state = "ASP"; push(@blocks, ["ASP", $dot, ""]); } elsif($last eq "%" && $char eq ">" && $state eq "ASP") { chop $blocks[-1][-1]; $state = "HTM"; push(@blocks, ["HTM", $dot, ""]); } else { $blocks[-1][-1] .= $char; } $last = $char; } close(FILE); return @blocks; }
this way, each element of the returned array contains three elements: the type (ASP or HTM), the line number, and the block itself.

cheers,
Aldo

King of Laziness, Wizard of Impatience, Lord of Hubris

Replies are listed 'Best First'.
Re: Re: Re: parsing an ASP file
by Juerd (Abbot) on May 23, 2004 at 22:57 UTC

    my $state = "HTM";

    The state is what I don't like. It means that everything needs to be done manually. So to get the line numbers, I'd probably just extend the regex with one set of all-enclosing parens (or for simple stand-alone scripts just use $&), and then count the number of \n characters found in it.

    my @parsed; my $line = 1; while ($asp =~ /\G( ((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) ) +? )/gsx) { $2 and push @parsed, [ $line, html => $2 ]; $3 and push @parsed, [ $line, asp => $3 ]; defined $4 and die "Unclosed ASP code block starting on line $line + near '", $asp =~ /\G(<%\s*\n?.*)/g, "'.\n"; $line += $1 =~ tr/\n//; }

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      well, your code looks surely good, but seems to be failing line count. on a simple ASP page of mine I get these results:

      mine yours
      HTM 1 HTM 1
      ASP 31 ASP 1
      HTM 31 HTM 31
      ASP 44 ASP 31
      HTM 46 HTM 46
      ASP 50 ASP 46
      HTM 50 HTM 50
      ASP 55 ASP 50
      HTM 59 HTM 59
      ASP 73 ASP 59
      HTM 75 HTM 75

      that is, it counts correctly for HTM blocks, but doesn't increment the line number for ASP blocks. I tried moving the line $line += ... before the push, but it didn't help.

      cheers,
      Aldo

      King of Laziness, Wizard of Impatience, Lord of Hubris

        seems to be failing line count.

        You're right. Because the regex can match a block of html and a block of asp in one go, in between $line already needs to be updated. So I removed the extra set of parens and the counter line again and added two new \n-counters: one for $1 and one for $2.

        my @parsed; my $line = 1; while ($asp =~ /\G((?: [^<]+ | <(?!%) )*) (?: <%(.*?)%> | ((?=<%)) )?/ +gsx) { $1 and push @parsed, [ $line, html => $1 ]; $line += $1 =~ tr/\n//; $2 and push @parsed, [ $line, asp => $1 ]; $line += $2 =~ tr/\n//; defined $3 and die "Unclosed ASP code block starting on line $line + near '", $asp =~ /\G(<%\s*\n?.*)/g, "'.\n"; }

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }