Your OP code is slow:
my %freq; for my $i ( 1 .. @data ) { $freq{ $1 }[ $-[0] ] += 1/@data while $data[ $i-1 ] =~ /(.)/g; }
For several reasons (in roughly scale of penalty):
Moving that out of the loop and adding the invariant is simple and effective.
Changing the loop parameters to for my $i ( 0 .. $#data ) avoids that repeated subtraction.
Why generate a list of integers, and then use those integers to index into the array, when you can access the array elements directly?
Most of those can be avoided like this:
my $inc = 1 / @data; ## do the i +nvariant once: for ( @data ) { ## $_ alias +es each line in turn avoids both the $i-1 and indexing the array $freq{ $1 }[ $-[0] ] += $inc while /(.)/g; ## the rege +x engine operates directly on the $_ alias. }
But that still incurs the penalty of the regex engine to get the chars 1 at a time. A naive alternative is ... for split //; but that still involves the regex engine, albeit in a faster mode.
The regex can be avoided completely using ... for unpack '(a1)*', $_;, but that template still requires parsing, albeit a simpler language; but the big penalty here is the construction of a long list of single chars, each in its own scalar that each require over 100 bytes of memory. That is a costly process.
Contrast that with the substr version $freq{ substr $data, $_, 1 }[ $_ ] += $inc for 0 .. length( $d )-1;. That numerical iterator costs only ~20 bytes regardless of the length of the string.
My other version trades the (fairly expensive) call to substr and the iterator, for a very cheap operation chop and an increment. And a little obscurity.
It can however be made to be the fastest of the lot with a couple of simple changes:
for( @data ) { ## use $_ to alias the +lines my $p = -1; ## Allow pre-increment +which is more efficient $freq{ chop() }[ ++$p ] += $inc while $_; ## implicit arg for cho +p; avoid call to length. }
That might see you over the 6X mark. Or not.
In reply to Re^3: Weighted frequency of characters in an array of strings
by BrowserUk
in thread Weighted frequency of characters in an array of strings
by K_Edw
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |