Re: Reducing memory usage on n-grams script

Hello Maire

    while ( $mycorpus{$filename} =~ /date:#(\d+)#/g ) {
        $date = $1;
    }
[download]

This will keep searching for a date and each time one is found, overwrite the previous one. An if might be better than a while there. And if you want the last one, /.*date:#(\d+)#/ might do the trick (the .* will make perl read the whole string first, and then backtrack to try and match "date"). If you use an if you might ask yourself is an else is required (calling next to jump to the next file might be an option)

Then, rather than collecting all the datasets and then trying to search them separately anyway, you could process them as you find them:

    while ( $mycorpus{$filename} =~ /comment:#(.*?)#/g ) {
        my $dataset = $1;
        while ($dataset =~ /(\w+) (?= ( (?:\s\w+){4} ) )/gx) {
          $counts{$date}{"$1 $2"}++;
        }
    }
[download]

Your code says "filename" when it's actually hashkeys, but if you had your input data in files, and didn't read them all at once, you would also save some memory. By the way, since your output is also an hash (ie, no ordering), sorting the keys has no effect.

Comment on Re: Reducing memory usage on n-grams script Select or Download Code

Replies are listed 'Best First'.
Re^2: Reducing memory usage on n-grams script by Maire (Scribe) on Sep 02, 2018 at 07:02 UTC
Thank you very much for this!	[reply]