in reply to Re: improving speed in ngrams algorithm (updated)
in thread improving speed in ngrams algorithm
A regex should be fasterI would already doubt that a regex is faster than accessing array elements in normal circumstances, but here you seem to have missed the fact that the n-grams are made of words rather than chars. So your regex becomes: /((\w+\s?){3})/g where each char of (part of) the string are checked to find spaces. In IB2017's solution this is done once by the split.
I know it's possible in a single regex without looping over start by playing around with \K or similarLook ahead assertions can help:
But it becomes cumbersome when working with words /(?=((\w+\s?){3}))\w+/g and probably not faster.DB<7> say for 'perlmonks' =~ /(?=(.{3}))./g per erl rlm lmo mon onk nks
In case you want really want to include non-letters try unpackunpack would probably be among the fastest solutions for character n-grams indeed.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: improving speed in ngrams algorithm (updated)
by LanX (Saint) on Jun 11, 2019 at 12:39 UTC |