Re: improving speed in ngrams algorithm

Benchmarking left to someone who cares :)

#!/usr/bin/perl

# https://perlmonks.org/?node_id=11101225

use strict;
use warnings;

my $sentence = "this is the text to play with";
my $ngramWindow_MIN = 2;
my $ngramWindow_MAX = 3;

my ($low, $high) = ($ngramWindow_MIN - 1, $ngramWindow_MAX - 1);

$sentence =~ /(?<!\S)\S+(?: \S+){$low,$high}?(?!\S)(?{
  print "START INDEX: @{[$` =~ tr| || ]} : $&\n"
  })(*FAIL)/;
[download]

Outputs (same lines, slightly different order) :

START INDEX: 0 : this is
START INDEX: 0 : this is the
START INDEX: 1 : is the
START INDEX: 1 : is the text
START INDEX: 2 : the text
START INDEX: 2 : the text to
START INDEX: 3 : text to
START INDEX: 3 : text to play
START INDEX: 4 : to play
START INDEX: 4 : to play with
START INDEX: 5 : play with
[download]

Comment on Re: improving speed in ngrams algorithm Select or Download Code