in reply to Re^2: Dear Monks
in thread Dear Monks

I agree your original question is wrong, if you wanted output like that, but here's how I did it: (you're lucky I was bored this morning...)
use strict; my %hh; #read in... while (<DATA>) { my ($k, $v) = split /\s*,\s*/; chomp $v; $hh{$k}{$v}++; } #now to print out... foreach my $k (sort {$a cmp $b} keys %hh) { my $tlist = ''; my $tally = 0; foreach my $v (sort {$a cmp $b} keys %{$hh{$k}}) { $tally += $hh{$k}{$v}; $tlist .= "$v, "; } chop $tlist; chop $tlist; #remove trailing ', ' print "$tally\t$k\t$tlist\n"; } __DATA__ abcd, GB abcd, UK abcd, US addd, US
(hopefully there aren't any typos. I wrote/tested on another machine and typed in by hand...)

Update: fixed a typo in code "== split" should be "= split";

Replies are listed 'Best First'.
Re^4: Dear Monks
by sivaraman (Initiate) on Mar 17, 2011 at 08:41 UTC
    Dear Monk,

    I am extremely sorry for not described the problem clearly. Your previous suggestion is really helpful. Here our file size is more than 30million, so it throws the Out of memory exception. Kindly suggest me that, how to resolve this issue. Thank you in advance.

      sivaraman,

      Thank you for recognizing that your lack of description is leading us into providing inadequate solutions. You also have to recognize that changing requirements (5 million to 30 million is a significant difference), will also lead to us wasting time and energy.

      You still have done an inadequate job of describing all of the requirements in order for us to provide a solution that meets your requirements. Each time a monk provides a new solution based on your "it didn't work because X", you reply with "that didn't work because Y". In order for us to help, you need to define all the parameters of the problem first. I mentioned a number of things in Re^3: Dear Monks. Since your current issue seems to be memory, consider a few more: What operating system? How much physical memory? Is perl 32 or 64 bit?

      There is a relatively simple solution if order doesn't matter but since you haven't removed that as a constraint for us - it is rather difficult to guess what will satisfy your unwritten requirements.

      Additionally, you have to understand that PerlMonks is not a free script writing service. We expect you to show effort. If you don't know where the perl documentation is - please see Perldoc online though a local copy was probably installed and available from the command line. With that said, I would be happy to provide a solution to you once you do a better job at describing the requirements but I am not going to keep guessing with "try this".

      Cheers - L~R

      Have a look at DB_File or AnyDBM_File ... you will need to store the hash on disk instead of in the memory if it's to big to fit there.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

      If I were you, then I'd sort the file using a system utility like the linux sort command (as sundial and others have suggested). Then all your "abcd" would already be grouped. You could then print them as you encountered them, and only track one "abcd" symbol at a time, writing back to a file or to the screen each time the symbol changed.

      if your data took this form:

      abcd, GS abcd, GT abcd, HI abcd, HI abcd, UK abcd, US abce, AK abce, AZ abce, GB abcf, UT abcf, US

      Can you see how you'd not need to keep track of every symbol (abcd, abce, abcf) all in one hash at once? You could simply read a line at a time, and tally them appropriately, and everytime you noticed that you were no longer reading abcd, but now some different symbol, you'd just need to reinitialize in a loop a new tracking set...

      Why don't you try that, and if you still can't get it, come back and ask more questions. Good Luck... --Ray