A regex should be fasterI would already doubt that a regex is faster than accessing array elements in normal circumstances, but here you seem to have missed the fact that the n-grams are made of words rather than chars. So your regex becomes: /((\w+\s?){3})/g where each char of (part of) the string are checked to find spaces. In IB2017's solution this is done once by the split.
I know it's possible in a single regex without looping over start by playing around with \K or similarLook ahead assertions can help:
But it becomes cumbersome when working with words /(?=((\w+\s?){3}))\w+/g and probably not faster.DB<7> say for 'perlmonks' =~ /(?=(.{3}))./g per erl rlm lmo mon onk nks
In case you want really want to include non-letters try unpackunpack would probably be among the fastest solutions for character n-grams indeed.
In reply to Re^2: improving speed in ngrams algorithm (updated)
by Eily
in thread improving speed in ngrams algorithm
by IB2017
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |