comment on

Welcome to the Monastery.

"When i using perl -c it takes too much time to check ..."

As I came to post this, I saw that ++vr had posted a Win32 solution. Our code is functionally similar up to the point of getting the perl -c exit code; then our methods diverge somewhat. Our general thinking about the problem was also very similar: don't let perl -c run for as long as it likes; let it run for as long as you like.

Here's my code.

#!/usr/bin/env perl

use strict;
use warnings;
use constant {
    TIMEOUT_USECS => 100000,
    MIN_LINES_TO_ASSESS => 10,
    OUT_FILE => 'pm_11114214_minimal_is_perl_test.out',
};
use constant CMD_LINE => 'perl -c IN_FILE 2> ' . OUT_FILE;

use Time::HiRes 'ualarm';

for my $file (@ARGV) {
    my $cmd = CMD_LINE;
    $cmd =~ s/IN_FILE/$file/;

    eval {
        local $SIG{ALRM} = sub { die };
        ualarm TIMEOUT_USECS;
        `$cmd`;
        $? and die;
        ualarm 0;
        print "$file is valid Perl code.\n";
        1;
    }
    or do {
        ualarm 0;
        heuristic_check($file);
    };
}

sub heuristic_check {
    my ($file) = @_;

    if (-z OUT_FILE) {
        print "$file could be Perl code.\n";
    }
    else {
        my $file_lines = (split ' ', `wc -l $file`)[0];
        my $out_lines = (split ' ', `wc -l @{[OUT_FILE]}`)[0];

        if ($file_lines < MIN_LINES_TO_ASSESS) {
            print "$file is too small to assess. [$file_lines lines]\n
+";
        }
        elsif ($out_lines > $file_lines) {
            print "$file does not look like Perl code.\n";
        }
        else {
            printf "%s has a %.02f%% chance of being Perl code\n",
                $file, 100 * ($file_lines - $out_lines) / $file_lines;
        }
    }

    return;
}
[download]

With the timeout set to 1ms, it assessed itself as "a 98.18% chance of being Perl code"; at 10ms it was a tad more confident with "a 98.25% chance of being Perl code"; and at 100ms, it was sure, with "is valid Perl code".

I tested a tiny text file I had in my tmp directory. It wasn't Perl and I decided tiny files were too small to assess if they didn't pass perl -c. In that case, the output showed "too small to assess. [6 lines]". However, I did create a file with just print "Hello, world!\n";. That gave "hello.pl is valid Perl code." at 100ms but, at 10ms, the output was "hello.pl is too small to assess. [1 lines]".

And I tested a plain text file containing no Perl code: that gave "does not look like Perl code" at 1ms, 10ms and 100ms.

— Ken

In reply to Re: Fastest way to minimally check that file contains perl code? by kcott
in thread Fastest way to minimally check that file contains perl code? by DRVTiny

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.