Getting started -- hashes!

shamala has asked for the wisdom of the Perl Monks concerning the following question:

hi Monks!! Around here(actively) for the last week..has been a great learning all through...bfore i can be a help to those queries.... i guess i have along way to go... heres the way..need your help to walk it.. This is only the logic to the program..need your help to code it in perl

if $flag then $refsize=`getsize()` #getsize is a subprogram   
                             that gets the size of the file
$newsize=$refsize;

sub prog[
  this is group $v=1;
  loop ($repeat times)
 <loop begin>$size=getsize();
    if $size==$newsize;
    $sizeT+=$size;         #cumulative addition
    $samefile++;
     else $newsize= $size;$flag=0 ;
         continue with the loop;     
 <loop end>
     $samefile with $size -- display.
     delete all the files bearing this size(using reverse hash and del
+ete with key as the size).
     $no of $samefile = $num;
     $repeat=128-$num;
     call sub prog if $rep!=0;
      else $rep=128 and $v++;       # go to the next group 
                                       of 128 files
[download]

well this is what i am trying. I have a group of files under a group. There are five such groups.

I want to traverse the group and get all the files under the group with the same size and list them as "this group contains n number of files with size x.

This happens until all the files are grouped and listed according to their sizes

This process for all the five groups

Do you think this logic would work

__DATA__

for example

i have a group of files(unique)say a,b,c with sizes x,x,y

i want the output to show "There are 2 files with size x total size being 2x

All the groups should show this output

each group contains 128 files

thanks so much

Comment on Getting started -- hashes! Download Code

Replies are listed 'Best First'.
Re: Getting started -- hashes! by davido (Cardinal) on May 15, 2004 at 09:13 UTC
Well, the basic design philosophy I might consider would be to push each file of a given size into an anonymous array referenced by a hash element whos key is the size. That way all files of the same size can be automatically grouped together by size. Then counting the files would be as simple as accessing a given hash element as a whole array, in scalar context so that the value obtained is the array element count. You mentioned using a reverse hash where the key is the size, so you're on the right track in that respect. Here's an example: `my %entries; while ( my $entry = <DATA> ) { chomp $entry; my ( $size, $name ) = split /\s+/, $entry; push @{$entries{$size}}, $name; } local $" = "\t"; print scalar( @{$entries{$_}} ), ": @{$entries{$_}}\n" for sort { $a <=> $b } keys %entries; __DATA__ 12 filename1 14 filename2 14 filename3 11 filename4 14 filename5` [download] Obviously this isn't a cut-n-paste solution, but it's the start of an example of a Perlish solution. Sometimes it's all about finding the right data structure. You may need to learn a little Perl to get it all implemented though. Dave	[reply] [d/l]
Use your language built-in features by TomDLux (Vicar) on May 15, 2004 at 21:32 UTC
You have ... if $flag then $refsize=`getsize()` # getsize is a subprogram # that gets the size of the file $newsize=$refsize; [download] How does getsize() know which file to process? When you call getsize(), a new shell is started up, configured, the getsize program is loaded from disk or from cache, and then it runs. That takes a relatively long time, and occurs often, once for each file. Why not use the built-in stat() function. You still have one disk access, either way, but save an unneccessarily wasted quarter second per file. Optimizing too early is a mistake, but there's no need to throw away time doing things that are already built into your language. -- `TTTATCGGTCGTTATATAGATGTTTGCA`	[reply] [d/l]