in reply to improving speed in ngrams algorithm

Benchmarking left to someone who cares :)

#!/usr/bin/perl # https://perlmonks.org/?node_id=11101225 use strict; use warnings; my $sentence = "this is the text to play with"; my $ngramWindow_MIN = 2; my $ngramWindow_MAX = 3; my ($low, $high) = ($ngramWindow_MIN - 1, $ngramWindow_MAX - 1); $sentence =~ /(?<!\S)\S+(?: \S+){$low,$high}?(?!\S)(?{ print "START INDEX: @{[$` =~ tr| || ]} : $&\n" })(*FAIL)/;

Outputs (same lines, slightly different order) :

START INDEX: 0 : this is START INDEX: 0 : this is the START INDEX: 1 : is the START INDEX: 1 : is the text START INDEX: 2 : the text START INDEX: 2 : the text to START INDEX: 3 : text to START INDEX: 3 : text to play START INDEX: 4 : to play START INDEX: 4 : to play with START INDEX: 5 : play with