in reply to Re: finding number of contiguous letters
in thread finding number of contiguous letters

... the number of 3-grams in a string is simply length( $string ) - 2 ;)
<picky_mode>
Only assuming the string does not contain whitespace.
</picky_mode>
use strict; print map { (length($_) - 2) . "\n" } split('\s+', "Just Another Perl Hacker"); __END__ 2 5 2 4

Replies are listed 'Best First'.
Re^3: finding number of contiguous letters
by graff (Chancellor) on May 23, 2007 at 11:36 UTC
    Depending on the application, it might be appropriate to include spaces (and/or punctuation) in the ngram list:
    jus ust st t a an ano not ...
Re^3: finding number of contiguous letters
by blazar (Canon) on May 23, 2007 at 11:40 UTC
    <picky_mode>
    Only assuming the string does not contain whitespace.
    </picky_mode>

    Why so? If we're dealing with plain sequences of letter from an alphabet, of which one thing called "whitespace" is part, than the latter should not be special in anyway. If you have some paticular application in mind, then YMMV. To quote from Wikipedia:

    For sequences of characters, the 3-grams (sometimes referred to as "trigrams") that can be generated from "good morning" are "goo", "ood", "od ", "d m", " mo", "mor" and so forth. Some practitioners preprocess strings to remove spaces, most simply collapse whitespace to a single space while preserving paragraph marks.
      I always thought whitespace was not a letter. I guess I was wrong.