in reply to Re: Recognizing Perl in text
in thread Recognizing Perl in text

I think a problem with the syntax check idea is, what exactly would you feed to perl -c ?

If you do it line by line, and have a code snippet like this

for my $foo (@foo) { for my $bar (@$foo) { push @{ $self->{results} }, { baz => foo( $bar->{baz}, $bar->{quux}[1] ) }; } }

not a single line (on its own) would pass a syntax check, while taken as a whole, the snippet is perfectly valid Perl code.

Of course, you could try to work around that problem by passing multiline snippets to the syntax checks, but then the number of possible combinations is going to explode rather soon, even for moderate file sizes...  So you'd at least need some additional heuristic to identify likely beginnings of code sections, or some such, in order to make this approach feasible in practice.

Replies are listed 'Best First'.
Re^3: Recognizing Perl in text
by LanX (Saint) on Jan 06, 2011 at 15:21 UTC
    With a clever strategy it's possible to significantly limit the number of possible chunks to check!

    Simply start checking the most indented line and successively add surrounding lines.

    for my $foo (@foo) { # 8 fails for my $bar (@$foo) { # 6 fails push @{ $self->{results} }, # 5 works { # 3 fails baz => foo( $bar->{baz}, # 2 works $bar->{quux}[1] ) # 1 fails }; # 4 works } # 7 works } # 9 works

    like this the overhead for identifying n lines of code is (statistically) at most linear!

    UPDATE: And it's still possible to rely on the existence of trailing semicolons or braces before running a syntax check.

    Cheers Rolf

Re^3: Recognizing Perl in text
by LanX (Saint) on Jan 06, 2011 at 14:48 UTC
    sure, but thats why I added the update about the indentation convention.

    Do you know any man pages with perl code that don't origin from POD? I don't...

    And I agree with Marshall who recommended scanning for trailing /;\s*$/ or /;\s*#.*$/ for a pretty good weighting heuristic.

    Cheers Rolf