in reply to improving speed in ngrams algorithm
My answer treats ngrams on characters not words.
A regex should be faster, this demo in the debugger for n=3 should give you a start.
DB<30> $str = join "", a..l DB<31> @res=() DB<32> for my $start (0..2) { pos($str) =$start; push @res, $str =~ +m/(.{3})/g } DB<33> x @res 0 'abc' 1 'def' 2 'ghi' 3 'jkl' 4 'bcd' 5 'efg' 6 'hij' 7 'cde' 8 'fgh' 9 'ijk'
NB:
(I know it's possible in a single regex without looping over start by playing around with \K or similar. I'll leave it to the regex gurus like tybalt to show it ;-)
HTH! :)
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
FootballPerl is like chess, only without the dice
In case you want really want to include non-letters try unpack
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: improving speed in ngrams algorithm (updated)
by Eily (Monsignor) on Jun 11, 2019 at 12:22 UTC | |
by LanX (Saint) on Jun 11, 2019 at 12:39 UTC |