rsriram has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a script to find out whether all the characters in a file are keyboard characters. For that, I am adding every line to a array, sort them and check the ASCII value of the first, second(I check this because the first will be a carriage return) and the last character. If the character is less than ASCII number 31 or greater than 121 and not equal to 10 (which is the carriage return), the program should report a error. Something fuzzy happens in the script and I get the error message even where the character is within the range. Can you please help me out?

@splitted = split("",$_);
@sorted = sort(@splitted);
if ((ord(@sorted[0]) < 31 && ord(@sorted[0]) != 10)||ord(@sorted[0]) > 121)
{
    print "Non-ASCII character found in the content";
}
if ((ord(@sorted[1]) < 31 && ord(@sorted[1]) != 10)||ord(@sorted[1]) > 121)
{
    print "Non-ASCII character found in the content";
}
if ((ord(@sorted[-1]) < 31 && ord(@sorted[-1]) != 10)||ord(@sorted[-1]) > 121)
{
    print "Non-ASCII character found in the content";
}

Or is there any other way to find out whether the character is out of this range?

Sriram

Replies are listed 'Best First'.
Re: Finding out non ASCII Characters in the text
by liverpole (Monsignor) on Jun 21, 2006 at 13:49 UTC
    Hi rsriram,

    It seems like you're doing a lot of work for just looking for non-printable characters.  (Actually, they're ALL ascii characters, even the non-printable ones).  Also, I'm not sure why you're comparing with 121; the printable characters go all the way to 127 126 (= "~").  (Thanks ambrus, I knew that, but somehow my thinking was "off by one").

    Instead of sorting, how about just applying the test to each character in the line?  Here's a subroutine which will return nonzero if the line contains non-printable characters, and zero otherwise:

    sub line_contains_non_printable { my ($line) = @_; my @split = split //, $line; foreach (@split) { my $ord = ord($_); return 1 if ($ord < 32 || $ord > 126); } return 0; }

    I'm sure there's at least several good modules for doing this as well (that I don't know of offhand), which other monks may be able to recommend.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      ~ is character 126; character 127 is a control character.
Re: Finding out non ASCII Characters in the text
by ptum (Priest) on Jun 21, 2006 at 13:53 UTC

    Wow, that seems a really brute force way to detect a non-ASCII character. If all you really need to do is detect it, I'd apply a regex to each line ... maybe like in this (untested) example:

    while (<$file_handle>) { if (/[^[:ascii:]]/) { # do something } }

    No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: Finding out non ASCII Characters in the text
by prasadbabu (Prior) on Jun 21, 2006 at 13:43 UTC
Re: Finding out non ASCII Characters in the text
by Zaxo (Archbishop) on Jun 21, 2006 at 19:04 UTC

    There is a simpler way - count them with tr///,

    $_ = 'a[bdy]dfjaPÃ?sdafÃ?'; print tr/\x0a\x20-\x7d//c, "unwanted characters in the string\n"; __END__ 2 unwanted characters in the string
    That's usable as a logical element, too.

    After Compline,
    Zaxo