Well, since I'm trying to develop my perl skills to use in bioinformatics problems such as this, I thought I'd give it a shot:
use warnings; use strict; use Data::Dumper; my %word_counts = (); my $DNA = "CGTAGATCCAGTCGA"; # set for the test code, actual dna shoul +d be parsed into a single line string with no whitespace my $cur_len = 3; #set curent word length to minimum word length my $max_len = (length $DNA) -1; #set maximum word length, set here to +avoid recalculating $DNA length for every iteration for (;$cur_len <= $max_len; $cur_len++){ #for each word length my $last_pos = (length $DNA) -$cur_len; #again, set to avoid recalc +ulating for every iteration for (my $pos = 0; $pos <= $last_pos; $pos++){ $DNA =~ m/^.{$pos}(.{$cur_len})/; $word_counts{$1}++; } } print Dumper(\%word_counts); exit;


The bottleneck here would be the ammount of word lengths you search. You could try tweaking that into fixed ranges for multiple program runs if you need to run it quickly. Or at least that's how I'd do it if it was me.

Hope it helps :)

PS: Would any of the fellow monks be kind to tell me if there's a way for the code tag not to break and wrap lines so shortly?

UPDATE: Just realized that code would probably consider AtC and ATC different words, so when you get your DNA sequence into the variable you should also make sure it's all upper or lower cased. like:
 $DNA = "\U$DNA";

In reply to Re: Exact string matching by Caio
in thread Exact string matching by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.