comment on

but I doubt its efficient for my case

I haven't benchmarked it, but I guess a comparison with your regex approach wouldn't turn out all that bad.

A regex pattern match also has to go through the entire string (or up until the first match). And you have to do it N times (21 in your example). The internal regex implementation, however, has the advantage of being fast, because it's written in C.

So it's essentially going through the string once, but slowly (iteration done in Perl), as opposed to going through the string 21 times, but fast...

The XOR method would also lend itself particularly well to being implemented in C (e.g. via Inline::C), in case you should need real speed.

Update: it turns out that even the Perl implementation of the XOR method is more than 4 times as fast:

#!/usr/bin/perl -w
use strict;
use Benchmark "cmpthese";

my $match = "abcdefghijklmnopqrst";
my $matchlen = length($match);
my $linelen = 50;
my @lines;
my $i = 0;
for (1..1e3) {
    my $line = join '', map chr(32+rand 96), 1..$linelen;
    my $s = $match;
    substr($s, $i++ % $matchlen, 1) = 'X';
    substr($line, rand $linelen-$matchlen, $matchlen) = $s;
    push @lines, $line;
}
my @patterns = ($match);
for my $i (0..$matchlen-1) {
    my $p = $match;
    substr($p, $i, 1) = ".";
    push @patterns, qr/$p/;
}

sub xorstr {
    my $line = shift;
    for my $i (0..$linelen-$matchlen) {
        my $s = substr $line, $i, $matchlen;
        my $x = $s ^ $match;
        my $diff = $x =~ tr/\0//c;
        return $s unless $diff > 1;
    }
    
}
sub regex {
    my $line = shift;
    for my $p (@patterns) {
        if ( $line =~ m/($p)/ ) {
            return $1;
        }
    }
}

cmpthese(-5,
    {
        regex  => sub { my $r = regex($_)  for @lines },
        xorstr => sub { my $r = xorstr($_) for @lines },
    }
);

__END__
         Rate  regex xorstr
regex  17.3/s     --   -82%
xorstr 97.3/s   462%     --
[download]

In reply to Re^3: generating hash patterns for searching with one mismatch by Eliya
in thread generating hash patterns for searching with one mismatch by cedance

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.