comment on

Perl is notorious for being able to parse stuff which looks like garbage, there's a whole category for Obfuscated code on PerlMonks. So let's hope your Perl programmers do this in less than 20% of their files ;)

For the general task of classifying data, there's AI::NaiveBayes and AI::Categorizer. They both need some adaption to parse text into the categories "Perl source code" and "garbage". I would guess that you get 80% accuracy with a filter based on the regular expressions presented by other monks, so only if this fails, training a Bayesian might be an alternative.

In reply to Re: Fastest way to minimally check that file contains perl code? by haj
in thread Fastest way to minimally check that file contains perl code? by DRVTiny

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.