Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

My program requires to read an input file with 3 columns(month dates, fileSize, fileName)in 3 steps: Here I only sample 2 months dates in Julian year order to give an example.
  1. count the total number of files loaded in each month.
  2. count the total file size for each month.
  3. count the largest file size in each month.
I need help to correct my code; when I run it, the output is not in individual month but in whole months. I also need the output in Descending order by file size for individual month. The month date and file name should automatically match their file size as well, so I can find the largest file with its date and file name. I am not sure which code can handle it. Much appreciated if anyone could improve my code to approach the solution. Here is my input file format. Second column is the file size.
001 175 FILENAME 002 1856 FILENAME 003 177 FILENAME 032 175 FILENAME 033 2345 FILENAME 034 175 FILENAME
Here is my code:
#!/usr/bin/perl use strict; use warning; use Data::Dumper; use File::Find; use File::stat; use sort 'stable'; my $filin = '/root/scripts/newsort.in'; my $fleot = '/root/scripts/results/size.out'; open my $fh, $filin || die $!; open my $fot, ">$fleot" || die $!; ##Define month lengths @Janlen = ( '006', '007', '008', '009', '010', '011', '012', '013', '0 +14', '015', '016', '017', '018', '019', '020', '021', '022', '023', ' +024', '025', '026', '027', '028', '029', '030', '031' ); @Feblen = ( '032', '033', '034', '035', '036', '037', '038', '039', '0 +40', '041', '042', '043', '044', '045', '046', '047', '048', '049', ' +050', '051', '052', '053', '054', '055', '056', '057', '058', '059' ) +; #Define month hash %mthlens = (@Janlen, @Feblen); my @julens = %mthlens; my $julias = @julens; my $Janlias = @Janlen; my $Feblias = @Feblen; my $Marlias = @Marlen; while (%mthlens=<$fh>){ chomp; my %lengths = map { $_ => length $_ } %mthlens; while ( my ($Janlen,$length,$filename) = each %lengths) { @s = sort { $length{$b} <=> $length{$a}} keys %length; print join("\t", $Janlen, $length, $filename ), "\n"; } } [/code] Here is my output file format. File sizes are displayed in the second column. I am not sure what are +the numbers following FILENAME, such as 38, 33 , 38 .... [code]024 710 FILENAME 38 114 923 FILENAME 33 044 367 FILENAME 38 083 7864 FILENAME 39 153 783 FILENAME 33 084 864 FILENAME
Very appreciated for any input! Thank you!

Replies are listed 'Best First'.
Re: Help! Stuck on methods to count file size.
by Eily (Monsignor) on Oct 10, 2013 at 15:45 UTC

    while (%mthlens=<$fh>) runs only once, no matter how many lines are in your file, because <$fh> is run in list context (ie: it is affected to a hash, which can take several elements in, so all the lines are returned at once to fill those elements).

    chomp, as a lot of other Perl function do, works by default on $_ if no argument is supplied. But since you don't affect anything to $_, it's quite useless. If you wanted your lines to be in $_ you could have written either while ($_=<$fh>) or while(<$fh>), and then you would have read line by line. You might want to use another variable instead though, with while (my $line = <$fh>) and then chomp $line.

    while ( my ($Janlen,$length,$filename) = each %lengths) doesn't make much sense, each returns a list of two values, so if you affect it to three scalars, the third one will be undef.

    Actually running your code under strict and warnings, and not just putting those line at the last moment to avoid being told to do so would help you avoid most of the mistakes you make instead of waiting for an answer here.

      I will waste one element on my XP-per-node score denominator to say this, because I truly enjoyed the raw truth of it.

      Actually running your code under strict and warnings, and not just putting those line at the last moment to avoid being told to do so would help you avoid most of the mistakes you make instead of waiting for an answer here.

      OMG LOL !!!

      Thank you for your reminding! It's very valuable!
Re: Help! Stuck on methods to count file size.
by hippo (Archbishop) on Oct 10, 2013 at 15:13 UTC

    What you have posted is not the code which you are running. This I know because it doesn't compile. Since it isn't the code which you are running I am sure that you'll agree there's not a great deal of point trying to debug it.

    Please post the actual code which you are using so that it may be analysed.

      It's my code, I know it can not run so I posted here and ask for all guru's guidance. Thanks!
        "It's my code, I know it can not run so I posted here and ask for all guru's guidance." [my emphasis]

        But, in your OP, you wrote:

        "I need help to correct my code; when I run it, the output is not in individual month but in whole months." [my emphasis]

        Clearly, one of those two contradictory statements is false.

        -- Ken

Re: Help! Stuck on methods to count file size.
by toolic (Bishop) on Oct 10, 2013 at 15:20 UTC
    Unrelated to your question, but I think you can simplify:
    @Janlen = ( '006', '007', '008', '009', '010', '011', '012', '013', '0 +14', '015', '016', '017', '018', '019', '020', '021', '022', '023', ' +024', '025', '026', '027', '028', '029', '030', '031' );

    as:

    @Janlen = map { sprintf '%03d', $_ } 6 .. 31;

    Same for Feblen. Also, I think those pluses are an artifact of how you posted your code. Read Writeup Formatting Tips and post here again.

      @a = ('006' .. '031'); :)
        I tried this format before but it doesn't work, most likely it is my code problem, thank you greatly for your input!
      I'll try it, I think it works, thank you very much for your posting.
Re: Help! Stuck on methods to count file size.
by Laurent_R (Canon) on Oct 10, 2013 at 16:57 UTC
    open my $fh, $filin || die $!;

    is not the best way to open a file (check the precedence of operators). Try this:

    open my $fh, "<", $filin or die $!;

    or:

    open (my $fh, "<", $filin) || die $!;
      It's really good suggestion, thank you for all the valuable suggestions.
Re: Help! Stuck on methods to count file size.
by marinersk (Priest) on Oct 10, 2013 at 15:35 UTC
    Hello.

    Here is my code:

    Hmm. I don't think so:

    C:\Steve\Dev\PerlMonks\P-2013-10-10@0927-MonthLen>monthlen.pl Can't locate warning.pm in @INC (@INC contains: C:\Steve\Perl C:/Perl/Perl-5.16.3.1603/site/lib C:/Perl/Perl-5.16.3.1603/lib .) at C:\Steve\Dev\PerlMonks\P-2013-10-10@0927-MonthLen\monthlen.pl l +ine 3. BEGIN failed--compilation aborted at C:\Steve\Dev\PerlMonks\P-2013-10- +10@0927-MonthLen\monthlen.pl line 3.

    I recommend you use a copy-and-paste feature to insert your code between proper <code> and </code> tags.

    Please note that </code> is not the same as the [/code] you used.

Re: Help! Stuck on methods to count file size.
by Lennotoecom (Pilgrim) on Oct 10, 2013 at 19:57 UTC
    well that's like super ugly
    but working:
    %jan = ('max' => 0, 'bytes' => 0, 'files' => 0); %feb = ('max' => 0, 'bytes' => 0, 'files' => 0); for (6..31) {$m[$_] = \%jan} for (32..59) {$m[$_] = \%feb} while(<DATA>){ ($day, $value, $name) = split / /; print "$day $value\n"; ${$m[$day]}{'max'} < $value ? (${$m[$day]}{'max'} = $value, ${$m[$day]}{'bytes'} += $value, ++${$m[$day]}{'files'}):( ${$m[$day]}{'bytes'} += $value, ++${$m[$day]}{'files'}) ; } print "jan\n"; foreach (sort keys %jan){ print "$_ $jan{$_}\n"; } print "feb\n"; foreach (sort keys %feb){ print "$_ $feb{$_}\n"; } __DATA__ 006 175 FILENAME 007 1856 FILENAME 008 177 FILENAME 032 175 FILENAME 033 2345 FILENAME 034 175 FILENAME
      Dear Lennotoecom, thank you very much for your beautiful code, it's really helpful!!!
      Great Monk, your code is Key->Important->Simple->Powerful! Thanks again!

      Dear Master, based on your previous code, I need to count individual type of files beside the whole files from system via the txt file. I tried in this way as below but it doesn't show the correct number of individual type of files, please point out the cause of mistake or provide me the correction, thanks!

      my $goodfile = ' good.txt ' my $badfile = 'bad.txt' %jan = ('max' => 0, 'bytes' => 0, 'files' => 0, 'gfiles =>0, 'bfiles = +> 0); %feb = ('max' => 0, 'bytes' => 0, 'files' => 0, 'gfiles => 0, 'bfiles +=> 0); for (6..31) {$m[$_] = \%jan} for (32..59) {$m[$_] = \%feb} while(<DATA>){ if($goodfile='gfiles') {($day, $value, $name) = split / /; print "$day $value\n"; ${$m[$day]}{'max'} < $value ? ${$m[$day]}{'max'} = $value, ${$m[$day]}{'bytes'} += $value, ++${$m[$day]}{'gfiles'}):( ${$m[$day]}{'bytes'} += $value, ++${$m[$day]}{'gfiles'}) ; } print "jan\n"; foreach (sort keys %jan){ print "$_ $jan{$_}\n"; } print "feb\n"; foreach (sort keys %feb){ print "$_ $feb{$_}\n"; } __DATA__ 006 175 FILENAME 006 176 good.txt 006 12 bad.txt 007 1856 FILENAME 007 1854 good.txt 008 172 bad.txt 008 177 FILENAME 008 23 good.txt 010 42 bad.txt 032 175 FILENAME 033 2345 FILENAME 032 318 good.txt 033 100 bad.txt 034 175 FILENAME
      download
        here's the code
        I also changed my previous mistakes with dereferencing
        feed back if there are any mistakes
        much appreciated
        %jan = ('max' => 0, 'bytes' => 0); %feb = ('max' => 0, 'bytes' => 0); for (6..31) {$m[$_] = \%jan} for (32..59) {$m[$_] = \%feb} while(<DATA>){ ($day, $value, $name) = split /\s+|$/; print "$day $value $name\n"; $m[$day]->{'max'} < $value ? ($m[$day]->{'max'} = $value, $m[$day]->{'bytes'} += $value, ++$m[$day]->{'files'}):( $m[$day]->{'bytes'} += $value, ++$m[$day]->{'files'}) ; $m[$day]->{$name}++; } print "jan\n"; foreach (sort keys %jan){ print "$_ $jan{$_}\n"; } print "feb\n"; foreach (sort keys %feb){ print "$_ $feb{$_}\n"; } __DATA__ 006 175 FILENAME 006 176 good.txt 006 12 bad.txt 007 1856 FILENAME 007 1854 good.txt 008 172 bad.txt 008 177 FILENAME 008 23 good.txt 010 42 bad.txt 032 175 FILENAME 033 2345 FILENAME 032 318 good.txt 033 100 bad.txt 034 175 FILENAME
      It's great post! It works nicely! But I need the file name following the max size too if possible... So much appreciated:-)
        I need pancakes
        well, then you should add another element "name" into the hashes
        and in the while-cycle if current max is found rewrite it,
        exactly the same as the 'max' itself