shan_emails has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: find junk file
by marto (Cardinal) on Jun 06, 2012 at 16:40 UTC
Re: find junk file
by BrowserUk (Patriarch) on Jun 06, 2012 at 17:14 UTC

    Given your screen name, perhaps this is the sort of thing you are looking for?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: find junk file
by MidLifeXis (Monsignor) on Jun 06, 2012 at 16:54 UTC

    Conversely (to Re: find junk file), if you can identify a 'good' file, anything else is a 'junk' file.

    --MidLifeXis

Re: find junk file
by thomas895 (Deacon) on Jun 06, 2012 at 16:51 UTC

    If by "junk file" you mean "a file consisting of random characters", then you could use a spelling checker and run the file through that. Then, if it finds, say, more than 50 errors, you classify it as a "junk file".
    Is that what you mean?

    ~Thomas~
    confess( "I offer no guarantees on my code." );

      If the file having the contents other than the keyboard characters then we say it as junk file

      For example of junk file, if we rename any zip file or Microsoft Excel/powerpoint file into txt format, then open this txt file, we can see many junked contents from this consider as junked file.

        Maybe you want to determine whether a file is a "text file" as opposed to a "binary file"? See -X for the -B and -T operators. Also note that UTF-8 encoded "text" files may look like "binary" files, depending on what kind of letters are on your keyboard. Also see http://www.daskeyboard.com/

        Oh, well, in that case, it's quite simple:

        use constant HIGHEST_CHAR_ON_KBD => 126, #These values may differ for +you, depending on where you bought LOWEST_CHAR_ON_KBD => 9; # your keyboard. There are so +me extra, non-keyboard chars in this range, as well. while( <FILE> ) { foreach( split("", $_) ) { if( ( ord($_) > HIGHEST_CHAR_ON_KBD ) || ( ord($_) < LOWEST_CH +AR_ON_KBD) ) { say "It's a binary file"; last; } } }

        It isn't the best way of doing things, but it's a start.
        Update: I completely forgot about spaces, tabs, carriage returns, and line feeds.

        ~Thomas~
        confess( "I offer no guarantees on my code." );