I'd thought I'd whip up a little test to compare the perl hash based solution to an Inline-C solution. Looks like the C version is about 85x faster...
Benchmark: timing 20 iterations of hash_string, inline...
hash_string: 37 wallclock secs (37.08 usr +  0.01 sys = 37.09 CPU) @  0.54/s (n=20)
    inline:  1 wallclock secs ( 0.44 usr +  0.00 sys =  0.44 CPU) @ 45.45/s (n=20)
#!/usr/bin/perl use Inline C; use Benchmark; my $gen = "atgcgc"x500000; #3 million characters $tests{"inline"} = sub { string_inline_c($gen, length($gen)) }; $tests{"hash_string"} = sub { hash_string($gen) }; timethese(20, \%tests); sub hash_string { my ($genome) = @_; my %count; $count{ substr($genome, $_, 2) }++ for (0..length($genome)-2); } __END__ __C__ int string_inline_c(char *genome, int len) { int i; int hash[96]; /* The hashing function is simply 4*(first char - 'a') + second ch +ar - 'a' */ /* i.e. the bucket for gg is 4*('g'-'a')+'g'-'a' = 30 */ /*initialize hash buckets which will get used*/ /*aa*/ /*ac*/ /*ag*/ /*at*/ hash[ 0] = hash[ 2] = hash[ 6] = hash[19] = 0; /*ca*/ /*cc*/ /*cg*/ /*ct*/ hash[ 8] = hash[10] = hash[14] = hash[27] = 0; /*ga*/ /*gc*/ /*gg*/ /*gt*/ hash[24] = hash[26] = hash[30] = hash[43] = 0; /*ta*/ /*tc*/ /*tg*/ /*tt*/ hash[76] = hash[78] = hash[82] = hash[95] = 0; for(i=0;i<len-1;i++) { hash[4*(genome[i]-'a')+(genome[i+1]-'a')]++; } /* returning the proper perl hash is left as an */ /* exercise for the reader */ /* see also the Inline-C Cookbook */ return(1); }

In reply to Inline::C by sleepingsquirrel
in thread how can I speed up this perl?? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.