If a person was to analyze a man page, can you pick out what lines are perl and what aren't, knowing that perl is likely? I've run across this problem before, and other than thinking about it, did nothing. I've started to play with something now. While, some lines score the same as ordinary text (typically < 1), most lines seem to score above that.
sub perl_score { my $line = shift; my $score = 0; if( $line =~ m| #!\s*/usr/bin/perl| ) { if( length( $line ) > 40 ) { # In line of text, probably. Maybe see how close # to beginning of line, and if comment after? # $score = 0; } elsif( length( $line ) > 20 ) { $score = 1; } else { $score = 2; } return $score; } my @vars = split(/[-\s\(\)\{\}\[\]\<\>=\'\"\~]+/, $line ); foreach my $v (@vars) { $score++ if( $v =~ /^[\$\@\%]/ ); $score++ if( $v =~ /[a-zA-Z]+::[a-zA-Z]+/ ); } my $t = 0; my @chars = qw|( ) { } [ ] = ;|; foreach my $c (@chars) { my $qc = quotemeta( $c ); my @tmp = split(/$qc/, $line ); my $n = $#tmp; $n = $n < 0 ? 0 : $n; $t += $n; } $score += $t / 2; #? $score += 2 if( $line =~ /[=!]~/ ); # Look for reserved words? return $score; }
In reply to Recognizing Perl in text by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |