in reply to string occurences

Are you working on a Unix system? If so you might want to use some of the available Unix tools. You could use "cut" to splice out the URL, feed the results to a file which could then "sort". Once that sort is completed, you could then easily count of the occurances of each URL without having to store a large number of lines or create many temp files. After the sort you could do something like:
# Untested my $current = ''; my $count = 0; while (<>) { if ( ($current ne $_) && ($current ne '') ) { print "$current :: $count \n"; $count = 0; $current = $_; } else { $count++; } }
You would invoke at the command line as ./foo.pl < sorted.file > file.count

Since the file is already sorted for you and contains only the URL, all of each URL will be grouped together. Therefor, once a URL changes you will know that you are done counting a particular URL. No need to store in memory any more than the current URL and the current count; Once the URL changes you dump out the count and move on to the next one.

Replies are listed 'Best First'.
Re: Re: string occurences
by runrig (Abbot) on Jun 12, 2001 at 22:06 UTC
    As long as you're going with a shell solution, you could just use:

    cat <file(s)> | cut ... | sort | uniq -c

      Thanks! I now know about "uniq -c", I did not before.
Re: Re: string occurences
by Anonymous Monk on Jun 13, 2001 at 00:00 UTC
    PERFECT!, this is exactly what I was looking for, thank u very much. I did not want to use array's as it begin thrashing my VM. This seems the best way. -burhan