in reply to Re^2: improving speed in ngrams algorithm (updated)
in thread improving speed in ngrams algorithm
I saw split // not split / /
That's why added the NB part saying to exclude white spaces and punctuation (which isn't done in the OP s code)
I haven't run ° it but the code looks broken to me if the split wasn't meant to be per character. @string holding words doesn't make sense to me!
I don't think that you can effectively process a natural language without regex.
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
FootballPerl is like chess, only without the dice
°) I ran it on my mobile and the output shows that the OP is looking for n words in a row. Hence we both misunderstood his definition of n gram
START INDEX: 0 :this is START INDEX: 1 :is the START INDEX: 2 :the text START INDEX: 3 :text to START INDEX: 4 :to play START INDEX: 5 :play with START INDEX: 0 :this is the START INDEX: 1 :is the text START INDEX: 2 :the text to START INDEX: 3 :text to play START INDEX: 4 :to play with
|
---|