in reply to Reducing memory usage on n-grams script

#!/usr/bin/perl # https://perlmonks.org/?node_id=1221461 use strict; use warnings; my %mycorpus = ( a => "date:#20180101# comment:#d1 d2 d3 d4 d5 d6#", b => "date:#20180101# comment:#b1 b2 b3 b4 b5 b6 b7# comment:#c1 c +2 c3 c4 c5 c6#", c => "date:#20180101# comment:#d1 d2 d3 d4 d5 d6#", ); open my $fh, '|-', 'sort | uniq -c' or die; for ( values %mycorpus ) { my ($date) = /date:#(\d+)#/; for ( /comment:#(.*?)#/g ) { my @words = split; print $fh "$date @words[$_..$_+4]\n" for 0 .. @words - 5; } } close $fh;

Outputs:

1 20180101 b1 b2 b3 b4 b5 1 20180101 b2 b3 b4 b5 b6 1 20180101 b3 b4 b5 b6 b7 1 20180101 c1 c2 c3 c4 c5 1 20180101 c2 c3 c4 c5 c6 2 20180101 d1 d2 d3 d4 d5 2 20180101 d2 d3 d4 d5 d6

Replies are listed 'Best First'.
Re^2: Reducing memory usage on n-grams script
by Maire (Scribe) on Sep 02, 2018 at 07:03 UTC
    Thanks!