A solution using split, array slices and shift. No idea if it is fast or slow as I haven't run any benchmarks.
use 5.026;
use warnings;
my $text = q{this is the text to play with};
for ( 1 .. 8 )
{
say qq{$_-word ngrams of '$text'};
say for nGramWords( $_, $text );
say q{-} x 20;
}
sub nGramWords
{
my( $nWords, $string ) = @_;
my @words = split m{\s+}, $string;
my $start = 0;
my @nGrams;
while ( scalar @words >= $nWords )
{
push @nGrams, join q{ },
qq{START INDEX: @{ [ $start ++ ] } : },
@words[ 0 .. $nWords - 1 ];
shift @words;
}
return @nGrams;
}
The output.
1-word ngrams of 'this is the text to play with'
START INDEX: 0 : this
START INDEX: 1 : is
START INDEX: 2 : the
START INDEX: 3 : text
START INDEX: 4 : to
START INDEX: 5 : play
START INDEX: 6 : with
--------------------
2-word ngrams of 'this is the text to play with'
START INDEX: 0 : this is
START INDEX: 1 : is the
START INDEX: 2 : the text
START INDEX: 3 : text to
START INDEX: 4 : to play
START INDEX: 5 : play with
--------------------
3-word ngrams of 'this is the text to play with'
START INDEX: 0 : this is the
START INDEX: 1 : is the text
START INDEX: 2 : the text to
START INDEX: 3 : text to play
START INDEX: 4 : to play with
--------------------
4-word ngrams of 'this is the text to play with'
START INDEX: 0 : this is the text
START INDEX: 1 : is the text to
START INDEX: 2 : the text to play
START INDEX: 3 : text to play with
--------------------
5-word ngrams of 'this is the text to play with'
START INDEX: 0 : this is the text to
START INDEX: 1 : is the text to play
START INDEX: 2 : the text to play with
--------------------
6-word ngrams of 'this is the text to play with'
START INDEX: 0 : this is the text to play
START INDEX: 1 : is the text to play with
--------------------
7-word ngrams of 'this is the text to play with'
START INDEX: 0 : this is the text to play with
--------------------
8-word ngrams of 'this is the text to play with'
--------------------
I hope this is of interest.