Well spotted, sauoq! There are indeed bugs:

  1. There was a bug in the implementation of skeeve2
  2. There was a bug in my correction algorithm
Fixing them does not significantly increase calculation time. for those who are interested, here the code to fix ikegami's benchmark:

sub skeeve2 { my %count; for my $i (0..length($seq)-GROUP_LENGTH) { $count{substr($seq, $i, GROUP_LENGTH)}++; } my @keys = keys %count; foreach my $key (@keys) { for my $i (1..GROUP_LENGTH-1) { $count{substr($key, 0, $i)} += $count{$key}; } } for my $i (1..GROUP_LENGTH-1) { for my $j ($i..GROUP_LENGTH-1) { ++$count{substr($seq, -$j, $i)}; } } 1; } sub skeeve3 { my %count; $count{"$1$2"}++ while $seq =~ /(.)(?=(..))/g; my @keys = keys %count; foreach my $key (@keys) { for my $i (1..GROUP_LENGTH-1) { $count{substr($key, 0, $i)} += $count{$key}; } } for my $i (1..GROUP_LENGTH-1) { for my $j ($i..GROUP_LENGTH-1) { ++$count{substr($seq, -$j, $i)}; } } 1; } sub skeeve3_i { my %count; $count{$1}++ while $seq =~ /(?=(.{3}))/g; my @keys = keys %count; foreach my $key (@keys) { for my $i (1..GROUP_LENGTH-1) { $count{substr($key, 0, $i)} += $count{$key}; } } for my $i (1..GROUP_LENGTH-1) { for my $j ($i..GROUP_LENGTH-1) { ++$count{substr($seq, -$j, $i)}; } } 1; }

I'll fix my algorithm outlined in my other post.

Regarding your other issues:

  1. including the file reads in your benchmark obscures the issue. As long as the machine has plenty of memory (and maybe yours doesn't) you are contaminating your results with two different methods of reading the data.

    I don't agree. It was intended that way.

    Reading chunks of data is important if you are low on memory. And I'm 100% sure the OP has to read his data from a file. So reading data is something that is essential for the algorithm.

  2. you could modify my algorithm and most of the others' to work with chunks as well if RAM really was an issue.

    Here I do agree. Of course can it be done. But you didn't ;-)

    I challenge you to do so and then let's compare times.

    But OTOH: My algorithm taking low memory into account is the same as a2, the one ikegami implemented as "skeeve2". So the only difference should be the time needed for reading of data. And if we all do so, and if we all read the same chunk-size, there shouldn't be a difference in the ranking.

    Of course: All of you could rewrite and use my correction algorithm. This would make some of your algorithms significantly faster, I think.

s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

In reply to Re^3: Question about speeding a regexp count by Skeeve
in thread Question about speeding a regexp count by Commander Salamander

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.