Start with this:

#!/usr/bin/perl use strict; use warnings; #psudo code from here on # open the file with categories # read the categories in (probably to a hash where the key is the cate +gory and the value 0) # close the categories file (you have your hash, you don't need to rea +d from the file anymore) # open the file with 3000k entries # read the file line by line # for each line read # trim to the first 8 characters # look for that value in the hash keys # increment the value of the hash key that is matched if any # you now have a hash with category as the key, and the number found a +s the value # you should be able to figure out how to find the top 100 values and +print out the key and value for each of them or store them to file
You can do this assignment with nothing but the basic Perl functionality.

Run the program using the -d (debugger) and learn to use that tool to examine and learn what those hashes and any other variables look like. It is a quick tool to learn to use if just doing basic examining for self-enlightenment.

PerlDoc is your friend. Tutorials like perldsc, perlop, perlfunc will all help you solve this pretty quickly, including example code much of the time.

Hope you find this helpful... Update:

Note that by using a hash, you eliminate the possibility of there being duplicate categories, simplifying and possibly making the effort more efficient.

Restated the increment step for clarity(I hope)

...the majority is always wrong, and always the last to know about it...

Insanity: Doing the same thing over and over again and expecting different results...

A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct


In reply to Re: Perl text processing by wjw
in thread Perl text processing by biboshakan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.