comment on

Dear BrowserUk thank you very much for your concern : 1- the faster XOr could handle to return matches for one 18 letter against 30274277 with 4 missmatches in 10 seconds but the c code could do for two 18 letters against the same data in 5 seconds the faster Xor was this :

#! perl -slw
use strict;
use bytes;

our $FUZZY ||= 4;

open KEYS, '<', $ARGV[ 0 ] or die "$ARGV[ 0 ] : $!";
my @keys = <KEYS>;
close KEYS;
chomp @keys;
warn "Loaded ${ \scalar @keys } keys";
my $seq ;
my $seqnam;
readseqfile();
my( $masked, $pos );

my $totalLen = 0;
my $count = 0;

    my $seqLen = length $seq;
    $totalLen += $seqLen;
    for my $key ( @keys ) {
        my $keyLen     = length $key;
        my $mask       = $key x ( int( $seqLen / $keyLen ) + 1 );
        my $maskLen    = length $mask;
        my $minZeros   = chr( 0 ) x int( $keyLen / ( $FUZZY + 1 ) );
        my $minZlen       = length $minZeros;
        for my $offset1 ( 0 .. $keyLen-1 ) { 

            $masked =  $mask ^ substr( $seq, $offset1, $maskLen );
            $pos = 0;
            while( 
                $pos = 1+index  $masked, $minZeros, $pos 
            ) {
                $pos--;
                my $offset2 = $pos - ($pos % $keyLen );
                last unless $offset1 + $offset2 + $keyLen <= $seqLen;
                
                my $fuz = $keyLen 
                    - ( substr( $masked, $offset2, $keyLen ) =~ tr[\0]
+[\0] );
                if( $fuz <= $FUZZY ) {
                    #printf "\tFuzzy matched key:'$key' -v- '%s' in li
+ne:"
                    #    .  "%2d @ %6d (%6d+%6d) with fuzziness: %d\n"
+, 
                    #    substr( $seq, $offset1 + $offset2, $keyLen ),
                    #    $., $offset1 + $offset2, $offset1, $offset2, 
+$fuz;
                }
                $pos = $offset2 + $keyLen;
            }
        }
    }

warn "\n\nProcessed $. sequences";
warn "Average length: ", $totalLen / $.;

    sub readseqfile {
        open( SEQ, "<$ARGV[1]" );
        while (<SEQ>) {
            chomp();
            if (/>(\S+)/) {

                $seqnam = $1;
                $seq = "";
            }
            else {
                $seq .= $_;
            }
        }
        close SEQ;
    }
[download]

2- how big the maximum data is the genome of wheat 12 gigabyte every file contains about multiple 30 megabyte data. 3- how short from 10 letters to 25 letter and the missmatch is about 25% percent of the length. 4- as mention the faster Xor above took 10 seconds to just report matches with miss match 4 letters for 18 letter against 30274277. 5- " what about it couldn't you handle " : A- about Xor : yes I could but the things that I notice that when the code slice the genome "target" for comparison it took too much " about 1 second

$masked =  $mask ^ substr( $seq, $offset1, $maskLen );
[download]

B- about the c code I only want to know what exactly the method he used and If there is modules like it in perl and if I can take only the part of fuzzysearch (library/code) in this code and integrate it in my code as an outside part. why his code is faster . # yes I looked in all how asked the same question like me but does really this is what perl can offer the half of the speed of c or ther is more

In reply to Re^2: Again Fuzzy regex !!!! by samman
in thread Again Fuzzy regex !!!! by samman

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.