in reply to Re^2: Again Fuzzy regex !!!!
in thread Again Fuzzy regex !!!!

the faster XOr could handle to return matches for one 18 letter against 30274277 with 4 missmatches in 10 seconds but the c code could do for two 18 letters against the same data in 5 seconds

Yes. That is about as good as you will get from pure Perl code. C will usually be faster.

There is no point working out what algorithm the C code is using because if you reimplemented it in Perl is would be much slower.

Perl is very poor at handling strings on a byte-by-byte basis -- you need to call a function (substr) to get at each and every byte; whereas C only need increment an address register.

My XOR code that you've reposted above plays to Perl's strengths by using single op-codes on the long strings to perform the majority of the processing; but in the end you need to call substr many times to extract the matches and substr is very slow.

If you need to stick to Perl, what you have is about as good as it is likely to get. If you need faster, then you'll have to bite the bullet and learn C.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked