in reply to Re: find junk file
in thread find junk file

If the file having the contents other than the keyboard characters then we say it as junk file

For example of junk file, if we rename any zip file or Microsoft Excel/powerpoint file into txt format, then open this txt file, we can see many junked contents from this consider as junked file.

Replies are listed 'Best First'.
Re^3: find junk file
by Corion (Patriarch) on Jun 07, 2012 at 07:24 UTC

    Maybe you want to determine whether a file is a "text file" as opposed to a "binary file"? See -X for the -B and -T operators. Also note that UTF-8 encoded "text" files may look like "binary" files, depending on what kind of letters are on your keyboard. Also see http://www.daskeyboard.com/

Re^3: find junk file
by thomas895 (Deacon) on Jun 07, 2012 at 07:26 UTC

    Oh, well, in that case, it's quite simple:

    use constant HIGHEST_CHAR_ON_KBD => 126, #These values may differ for +you, depending on where you bought LOWEST_CHAR_ON_KBD => 9; # your keyboard. There are so +me extra, non-keyboard chars in this range, as well. while( <FILE> ) { foreach( split("", $_) ) { if( ( ord($_) > HIGHEST_CHAR_ON_KBD ) || ( ord($_) < LOWEST_CH +AR_ON_KBD) ) { say "It's a binary file"; last; } } }

    It isn't the best way of doing things, but it's a start.
    Update: I completely forgot about spaces, tabs, carriage returns, and line feeds.

    ~Thomas~
    confess( "I offer no guarantees on my code." );

      thomas895:

      So a file is a text file unless someone uses a space?

          ...or a tab?

          ...or a carriage return?

          ...or a newline, escape sequence, ....?

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.