mxtime has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm very new to perl and am in the middle of a problem.
I'm trying to sort files by largest file in a directory by two groups, either beginning with d1 or d2, along with removing the file size with awk.


Running Solaris
example setup:
directory: /tmp/dir1
files: d1file1.txt(5kb) d1file2.txt(10kb) d1file3.txt(7kb) and d2file1.txt(10kb) d2file2.txt(5kb) d2file3.txt(7kb)
Here is the code: my $var1 = `du -k /tmp/dir1/d1* |sort -rn |head -1 |awk '{print $2}'`

And then I would do the same for the d2 files.

I'm sure there are better/easier ways to do it, but I am trying with what I know the most. (which is not much!)

Replies are listed 'Best First'.
Re: find biggest file and use awk
by graff (Chancellor) on May 21, 2014 at 04:01 UTC
    If all the files of interest are in one directory, and they all match a simple file-name pattern, and you just want to print the name of the largest file, here's one easy way to do that in Perl:
    #!/usr/bin/perl use strict; use warnings; my $path = "/tmp/dir1"; my %filesize = map { $_ => -s } <$path/d1*>; my @sorted = sort { $filesize{$b} <=> $filesize{$a} } keys %files; printf "Largest file: %s (%d bytes)\n", $sorted[0], $filesize{$sorted[ +0]};
    Of course, I expect it would be better to have the path and file-name pattern of interest be a command line argument (or a sensible default, like all files in the current working directory), because hard-coding this in the script is bothersome. So I'd rather do it like this:
    #!/usr/bin/perl use strict; use warnings; my $glob_pattern = shift || './*'; my %files; for ( glob( $glob_pattern )) { $files{$_} = -s _ if ( -e ); } if ( scalar keys %files == 0 ) { warn "No files matched $glob_pattern\nUsage: $0 [path/name*]\n"; exit(1); } my @sorted = sort { $files{$b} <=> $files{$a} } keys %files; printf( "Largest file that matches %s is %s (%d bytes)\n", $glob_pattern, $sorted[0], $files{$sorted[0]} );
    Now, in that case, when I run the script, I have to put the command-line argument in quotes, because otherwise, the shell will do the glob expansion, and my script will only see the first file name that matches the glob. In other words, if the script is called "show-biggest", the command line would have to be:
    show-biggest '/tmp/dir1/d1*' # note the single quotes # or: show-biggest /tmp/dir1/d1\* # note the backslash escape for "*"
    BTW, in trying this out, I learned that there is a subtle difference between this:
    my @files = <something>;
    and this:
    my $glob = "something"; my @files = glob( $glob );
    In the first approach, if "something" doesn't match anything, @files will be empty, but in the second approach, it will have one element, which is the string that was passed to the glob() function. The difference goes away if the value of $glob contains any wild-card characters (* or ? or square brackets) - I haven't checked, but I'll bet this is documented behavior... That's why I added a test for file existence (-e) in my second version of the script above.

    I also learned that glob( $glob_pattern ) does the right thing, where <$glob_pattern> doesn't. (Perl treats the latter as an unopened file handle.)

Re: find biggest file and use awk
by johngg (Canon) on May 20, 2014 at 23:06 UTC

    You could use the core File::Find module and the stat function to scan a directory looking for the file taking up most disk blocks (512 bytes on many file systems though not all). Running the following code in my "Downloads" directory finds that a large CentOS ISO is the guilty culprit.

    $ perl -MFile::Find -Mstrict -Mwarnings -E ' my %largest = ( name => q{}, size => 0 ); find( sub { my $blocks = ( stat )[ 12 ]; do { $largest{ name } = $File::Find::name; $largest{ size } = $blocks; } if $blocks > $largest{ size }; }, q{.} ); say qq{$largest{ name } - $largest{ size } blocks};' CentOS-5.10-x86_64-bin-DVD-1of2.iso - 9125976 blocks $

    I hope this is helpful.

    Update: Substituted $File::Find::name for $_ to store the full path rather than just the file name.

    Cheers,

    JohnGG

Re: find biggest file and use awk
by Anonymous Monk on May 20, 2014 at 21:34 UTC

    Your subject says that you want to use awk, which your code already does, and it works for me. You haven't indicated what the problem is?

    Although you could just use the code you have, I'm guessing your question is how to do this in Perl.

    Piece 1: List files with glob (there are other ways, such as readdir or Path::Tiny, but glob is good for simple tasks), e.g. glob('/tmp/dir1/d1*')

    Piece 2: Get the size of a file with -s, e.g. -s $filename

    Piece 3: Using an array of arrays to hold the filenames and sizes (see also perlreftut)

    Piece 4: Sort numerically with sort, e.g. sort {$b<=>$a} @filesizes

    Putting it all together:

    my @filenames = glob '/tmp/dir1/d1*'; my @files_and_sizes; for my $filename (@filenames) { my $filesize = -s $filename; push @files_and_sizes, [$filename, $filesize]; } use Data::Dumper; # just for demo & debugging print Dumper \@files_and_sizes; # just for demo & debugging @files_and_sizes = sort { $$b[1] <=> $$a[1] } @files_and_sizes; print Dumper \@files_and_sizes; # just for demo & debugging my $var1 = $files_and_sizes[0][0]; print $var1;

    This could even be dramatically shortened into a one-liner, esp. if you replace the for with map. There are also several other ways to do this, e.g. with modules.

Re: find biggest file and use awk
by wjw (Priest) on May 20, 2014 at 21:21 UTC

    Check and see if File::Util is available to you. I believe it will provide you with what you want if it is available. You can find out by running instmodsh.

    Hope that gives you a start...

    Update
    !/usr/bin/perl use strict; use warnings; use File::Util; my($f) = File::Util->new(); sub filesize { my ( $selfdir, $subdirs, $files, $depth ) = @_; print "$_ " . ( -s $_ ) . "\n" for sort { (-s $a) <=> (-s $b) +} @$files; } $f->list_dir( '/tmp/dir1' => { recurse => 0, callback => \&filesize, p +attern=>'\.pl$' } );

    Update_2Forgot to mention that this code is right from the POD for the File::Util module. It is not mine!

    ...the majority is always wrong, and always the last to know about it...

    Insanity: Doing the same thing over and over again and expecting different results...

Re: find biggest file and use awk
by Anonymous Monk on May 20, 2014 at 21:29 UTC
    Re^3: Getting modification time in perl not working on my MAC
    use Path::Tiny qw/ path /; my @all_files = path( $dir )->realpath->children(); my @done = map { $$_[1] } sort { $$a[0] <=> $$b[0] } map { [ $_->stat->mtime, $_ ] } grep m{d2[^\\/]+$}, @all_files; my @dtwo = map { $$_[1] } sort { $$a[0] <=> $$b[0] } map { [ $_->stat->mtime, $_ ] } grep m{d1[^\\/]+$}, @all_files;
Re: find biggest file and use awk
by mxtime (Initiate) on May 21, 2014 at 20:30 UTC
    Thanks all for the great info. You guys are amazing! I really expected to get shot down and told to do more newb research, but you all really helped! Thanks!