Re: Recognizing Perl in text

Look for the “idioms” of the language, such as keywords like sub and the presence of identifiers beginning with $, @, %, $$, ->. Score the text in favor of the various candidate languages you think might be there, giving various weights to the idioms that you see, and take the highest candidate score.

Some idioms are just-about “show stoppers.” For example, the presence of <?...?> pretty well screams (ick...) “PHP.”

Replies are listed 'Best First'.
Re^2: Recognizing Perl in text by Anonymous Monk on Jan 06, 2011 at 17:47 UTC
That is in part what I was doing here. Some "other" languages: `2H_2 + O_2 -> 2H_2O Ca-48 + n -> g + Ca-49 -> B- + g + Sc-49 -> B- + g + Ti-49 X = R^{-1} D R` [download] But the object is to separate the human written language (English) from something else. Perl happens to be a good example of something else, when the document is a man page. Ideally, the documents wouldn't have content of the above examples, but rather an instruction to import that content from somewhere. Skipping over the import instruction would be easy. But people will manually enter stuff like the above anyway.	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Recognizing Perl in text
by Anonymous Monk on Jan 06, 2011 at 17:47 UTC

That is in part what I was doing here. Some "other" languages:

  2H_2 + O_2 -> 2H_2O

  Ca-48 + n -> g + Ca-49 -> B- + g + Sc-49 -> B- + g + Ti-49

  X = R^{-1} D R
[download]

But the object is to separate the human written language (English) from something else. Perl happens to be a good example of something else, when the document is a man page. Ideally, the documents wouldn't have content of the above examples, but rather an instruction to import that content from somewhere. Skipping over the import instruction would be easy. But people will manually enter stuff like the above anyway.

[reply]
[d/l]