It doesn't really matter how large the file is, but rather how many unique strings it contains.
Something as simple as this:
perl -nle"++$h{ $1 } while m[(\S+)]g }
{ print qq[$_ : $h{ $_ }] for keys %h" hugeFile
will work quite well for any size of file if there are no more than a few (low) 10s of millions of unique strings to count.
If the number of unique strings is larger than that, then you will likely run out of memory constructing the hash.
The other way to go, is to use your systems sort utility to order the strings, you can then process the sorted file, line by line, and count the consecutive matches and output your counts without having to build a data structure to hold them all.
If there are multiple strings per line, pre-process the file, line by line, and split the lines into strings and output them one per line. Then feed that to your system sort (with -u if it supports it).
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.