Hello Maire

while ( $mycorpus{$filename} =~ /date:#(\d+)#/g ) { $date = $1; }
This will keep searching for a date and each time one is found, overwrite the previous one. An if might be better than a while there. And if you want the last one, /.*date:#(\d+)#/ might do the trick (the .* will make perl read the whole string first, and then backtrack to try and match "date"). If you use an if you might ask yourself is an else is required (calling next to jump to the next file might be an option)

Then, rather than collecting all the datasets and then trying to search them separately anyway, you could process them as you find them:

while ( $mycorpus{$filename} =~ /comment:#(.*?)#/g ) { my $dataset = $1; while ($dataset =~ /(\w+) (?= ( (?:\s\w+){4} ) )/gx) { $counts{$date}{"$1 $2"}++; } }

Your code says "filename" when it's actually hashkeys, but if you had your input data in files, and didn't read them all at once, you would also save some memory. By the way, since your output is also an hash (ie, no ordering), sorting the keys has no effect.


In reply to Re: Reducing memory usage on n-grams script by Eily
in thread Reducing memory usage on n-grams script by Maire

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.