in reply to Re: Exact string matching
in thread Exact string matching

(I was bit carried away...sry for my poor formating earlier)

Dear Ram, Thank you very much for your kind assistance but this works only if the words in the file is separated with a defined spacer such as a white space, what if the file contains only strings without any spacer (junk of characters or sequence of characters to be precise). That's where I am stuck. I need to find the number of occurrence of all possible substrings, that to in linear time (sry, that I was not clear). 

example: 

$text = 'howdoidoit' 

and the answer should be like 

For substring of length 3 

how - 1 
owd - 1 
wdo - 1 
doi - 2 
oid - 1 
ido - 1 
oit - 1

Replies are listed 'Best First'.
Re^4: Exact string matching
by BrowserUk (Patriarch) on Oct 16, 2011 at 20:05 UTC

    The fastest way to n-tuple long strings is using unpack:

    $text = 'howdoidoit';; print for unpack '(a3X2)*', $text;; how owd wdo doi oid ido doi oit it it print for unpack '(a4X3)*', $text;; howd owdo wdoi doid oido idoi doit oit oit oit

    You have to discard the last n-1 results but that is very quick and simple to do.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      ... and this approach can easily be extended to make n dynamic by interpolating the width counts in the unpack template string.

      >perl -wMstrict -le "my $text = 'howdoidoit'; ;; my $n = 3; my $back = $n - 1; ;; my @unpacked = unpack qq{(a$n X$back)*}, $text; my %count; $count{$_}++ for @unpacked[0 .. $#unpacked - $back]; ;; use Data::Dumper; print Dumper \%count; " $VAR1 = { 'wdo' => 1, 'ido' => 1, 'owd' => 1, 'how' => 1, 'oid' => 1, 'oit' => 1, 'doi' => 2 };
Re^4: Exact string matching
by ramprasad27 (Sexton) on Oct 16, 2011 at 16:27 UTC
    try this
    foreach ($cont =~ m/([a-z]{3})/g ){ $hash{$_}++; }
    what do you mean by liner time? and lastly you need to modify the pattern depending on what you want, please work on it

      Note that your approach will only return non-overlapping trigrams:

      > perl -wle "print for 'howdoyoudo' =~ /([a-z]{3})/g" how doy oud

      I would advise the original poster to really work on the question and maybe search CPAN for Ngrams or Trigrams.