Re: string occurences

Are you working on a Unix system? If so you might want to use some of the available Unix tools. You could use "cut" to splice out the URL, feed the results to a file which could then "sort". Once that sort is completed, you could then easily count of the occurances of each URL without having to store a large number of lines or create many temp files. After the sort you could do something like:

# Untested
my $current = '';
my $count = 0;
while (<>) {
   if ( ($current ne $_) && ($current ne '') ) {
      print "$current   ::   $count  \n";
      $count = 0;
      $current = $_;
   } else {
      $count++;
   }
}
[download]

You would invoke at the command line as ./foo.pl < sorted.file > file.count

Since the file is already sorted for you and contains only the URL, all of each URL will be grouped together. Therefor, once a URL changes you will know that you are done counting a particular URL. No need to store in memory any more than the current URL and the current count; Once the URL changes you dump out the count and move on to the next one.

Comment on Re: string occurences Select or Download Code

Replies are listed 'Best First'.
Re: Re: string occurences by runrig (Abbot) on Jun 12, 2001 at 22:06 UTC
As long as you're going with a shell solution, you could just use: `cat <file(s)> \| cut ... \| sort \| uniq -c`	[reply] [d/l]
Re: Re: Re: string occurences by Sifmole (Chaplain) on Jun 12, 2001 at 22:19 UTC
Thanks! I now know about "uniq -c", I did not before.	[reply]
Re: Re: string occurences by Anonymous Monk on Jun 13, 2001 at 00:00 UTC
PERFECT!, this is exactly what I was looking for, thank u very much. I did not want to use array's as it begin thrashing my VM. This seems the best way. -burhan	[reply]