in reply to Re^5: The sum of absolute differences in the counts of chars in two strings.
in thread The sum of absolute differences in the counts of chars in two strings.
It might as you say appear to be “brute force,” but, a digital computer is a brutally-powerful machine.The problem space is small and unchanging, and microseconds matter. The overhead of “fancy pants” data structures is pointless and unwanted. The algorithm you wrote runs as fast as it can, such that if you need it to run faster you just have to increase the clock-speed.
It is exactly the above kind of pat, assumptive, bland, non-sequitous, utterly meaningless drivel that you continuously spout, that pretty much guarantees that there'll be no hatchet burying ceremony any time soon.
For an apparently intelligent man with (apparently) lots of computer experience, it beggars belief that you haven't understood the concept that a good algorithm (well implemented) always trumps brute force.
Searching for one string inside another string is, according to your logic above, something that can only run as fast as the machine will let it.
C runtime libraries implement strstr() something like this:
char *strstr(char *s1, *s2 ) { register char *p = s1; extern char *strchr(); extern int strncmp(); register int len = strlen( s2 ); for( ; (p = strchr( p, *s2 ) ) != 0; p++ ) { if( strncmp( p, s2, len ) == 0 ) { return (p); } } return (0); }
If you're lucky, then a descent compiler will optimise away (inline) the incestuous function calls in that and may even fold away the multiple calls to strlen() that result, but it is fairly easy to code a solution -- assuming the same constraints and failure modes -- that will beat that implementation.
So, you would conclude, brute force wins.
But then what of Boyer-moore, Rabin-Karp, Knuth-Morris-Pratt and Bitap algorithms?
By your conclusions, a brute force C implementation something like this:
// Assumes equal length ascii strings and no embedded nulls. int fuzzyMatch( char *a, char *b ) { int sum = 0; while( *a ) if( !( *a++ - *b++ ) ) ++sum; return sum; }
Should win hands down. And be much quicker than anything one might write in Perl.
And yet, if we compare that with (the right) pure perl implementation:
( $s[ $ai ] ^ $s[ $_ ] ) =~ tr[\0][\0]
We get the following benchmark results:
C:\test>fuzzy-b Rate a b a 7.33/s -- -20% b 9.16/s 25% --
What if I tell you that the pure Perl implementation is tagged 'b' above. That the two-pass, exclusive-OR the two strings and then count the nulls (in Perl) runs 25% faster than the one-pass, pretty optimal, brute force C version? Surprised? Disbelieving?
Try it for yourself. I very much look forward to your sage analysis and profound wisdom on how this could possibly be so.
Benchmark code:
#! perl -slw use strict; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => 'fuzzy-b', CLEAN_AFTER_BUILD => 0; // Assumes equal length ascii strings and no embedded nulls. int fuzzyMatch( char *a, char *b ) { int sum = 0; while( *a ) if( !( *a++ - *b++ ) ) ++sum; return sum; } END_C use Data::Dump qw[ pp ]; use Benchmark qw[ cmpthese ]; chomp( our @s = <DATA> ); our $I //= -1; cmpthese $I, { a => q[ my $total = 0; for my $ai ( 0 .. $#s ) { $total += fuzzyMatch( $s[ $ai ], $s[ $_ ] ) for $ai+1 .. $ +#s; } print 'a', $total if $I == 1; ], b => q[ my $total = 0; for my $ai ( 0 .. $#s ) { $total += ( $s[ $ai ] ^ $s[ $_ ] ) =~ tr[\0][\0] for $ai+1 + .. $#s; } print 'b', $total if $I == 1; ], }; __DATA__ aaaaaaaaaaacccg aaaaaaaaaaccccg aaaaaaacccccccc aaaaaaaaacccccg aaaaaaaaaaaccgg aaaaaaaaaacccgg aaaaaaacccccccg aaaaaaaaaccccgg aaaaaaaaaaccggg aaaaaaaacccccgg aaaaaaccccccccg aaaaaaaaacccggg aaaaacccccccccg aaaaaaaccccccgg aaaaccccccccccg aaaaaaaaccccggg aaaaaaaaaacgggg aaaaaacccccccgg aaaaaaaaaacccct aaaaaaaaaccgggg aaaaaccccccccgg aaaaaaacccccggg aaaaaaaaaccccct aaaaaaaaaaaccgt aaaacccccccccgg aaaaaaaacccgggg aaaaaaccccccggg aaaaaaaacccccct aaaaaaaaaaaaggt aaaaaaaaaacccgt aaaccccccccccgg aaaaacccccccggg aaaaaaaaacggggg aaaaaaaccccgggg aaaaaaaccccccct aaaaaaaaaccccgt aaaaaaaaaaacggt aaaaaaaaccggggg aaaaccccccccggg aaaaaacccccgggg aaaaaacccccccct aaaaaaaacccccgt aaaaaaaaaaccggt aaaaaccccccgggg aaacccccccccggg aaaaaaacccggggg aaaaaaaccccccgt aaaaaccccccccct aaaaaaaaacccggt aaaaaaaaaaagggt aaaaaaaacgggggg aaaacccccccgggg aaccccccccccggg aaaaaaccccggggg aaaaaaaaccccggt aaaaaaaaaacgggt aaaaaacccccccgt aaaacccccccccct aaaaacccccggggg aaaaaaaccgggggg aaaccccccccgggg aaaaaccccccccgt aaaaaaacccccggt aaaaaaaaaccgggt aaaaccccccggggg aaaaaaaaggggggg aacccccccccgggg aaaaaacccgggggg aaaaaaaacccgggt aaaaaaaaaaggggt aaaacccccccccgt aaaaaaccccccggt aacccccccccccct aaacccccccggggg accccccccccgggg aaaaaccccgggggg aaaaaaaaaaccctt aaaaaaacggggggg aaaaacccccccggt aaaaaaaccccgggt aaaccccccccccgt aaaaaaaaacggggt aaccccccccggggg aaaaaaaaacccctt aaaacccccgggggg aaaaaaccggggggg aaaaaaaaaaacgtt aacccccccccccgt aaaaaacccccgggt aaaaaaaaccggggt aaaaccccccccggt aaaaaaaaccccctt aaaaaaagggggggg acccccccccggggg aaaccccccgggggg aaaaacccggggggg aaaaaaaaaaccgtt aaacccccccccggt aaaaaaacccggggt aaaaaccccccgggt aaaaaaaaagggggt aaaaaaacccccctt aacccccccgggggg aaaaaacgggggggg aaaaaaaaacccgtt aaaaaaaaaaaggtt aaaaccccggggggg aaccccccccccggt aaaaaaccccggggt aaaaaaaacgggggt aaaacccccccgggt aaaaaaaaccccgtt aaaaaaaaaacggtt accccccccgggggg aaaaaccgggggggg aaacccccggggggg aaaaaaccccccctt aaaccccccccgggt acccccccccccggt aaaaacccccggggt aaaaaaaccgggggt aaaacccgggggggg aaaaaaacccccgtt aaaaaaaaaccggtt aaaaaaggggggggg cccccccccgggggg aaccccccggggggg aaaaacccccccctt aaaaaacccgggggt aaaaaaaaggggggt aaaaccccccggggt aacccccccccgggt acccccccggggggg aaaccccgggggggg aaaaaaaaaagggtt aaaaaaaacccggtt aaaaaaccccccgtt aaaaacggggggggg aaaaccccccccctt aaaaaccccgggggt aaacccccccggggt accccccccccgggt aaaaaaacggggggt aaaaacccccccgtt aaaaccggggggggg aacccccgggggggg aaacccccccccctt aaaaaaaaacgggtt ccccccccggggggg aaaaaaaccccggtt aaaaaaaaaaacttt aaccccccccggggt aaaaaaccggggggt aaaacccccgggggt aaacccggggggggg accccccgggggggg aaaaccccccccgtt aaccccccccccctt aaaaaaaaccgggtt aaaaaacccccggtt aaaaaaaaaaccttt aaaaacccggggggt aaaaaaagggggggt aaaccccccgggggt acccccccccggggt aaccccggggggggg aaaacgggggggggg aaaaaccccccggtt aaacccccccccgtt aaaaaaacccgggtt cccccccgggggggg aaaaaaaaaggggtt aaaaaaaaacccttt aaaaccccggggggt aaaaaacgggggggt aacccccccgggggt aaccccccccccgtt aaaacccccccggtt aaaaaaaacggggtt aaaccgggggggggg aaaaaaccccgggtt acccccggggggggg aaaaaaaaaacgttt aaaaaaaaccccttt aaaaaccgggggggt aaacccccggggggt accccccccgggggt aaaaacccccgggtt acccccccccccgtt aaaaaaaccggggtt aaaccccccccggtt aacccgggggggggg aaaaaaaaaccgttt aaccccccggggggt cccccccccgggggt aaaaaaggggggggt aaaaaaacccccttt aaaacccgggggggt accccgggggggggg aacccccccccggtt aaaaaacccggggtt aaaaccccccgggtt aaaaaaaagggggtt aaaaaaccccccttt aaaaaaaaaaggttt aaaaacggggggggt aaaccccgggggggt aaaaaaaacccgttt acccccccggggggt aaaaaccccggggtt aaaaaaacgggggtt aaccggggggggggg accccccccccggtt aaacccccccgggtt aaaaaaaccccgttt ccccccccggggggt aaaaccggggggggt aacccccgggggggt aaaaaaaaacggttt aaaaacccccccttt aaccccccccgggtt cccccccccccggtt aaaaaaccgggggtt aaaacccccggggtt aaacccggggggggt aaaaagggggggggt accccccgggggggt aaaaaacccccgttt aaaaaaaaccggttt aaaaccccccccttt aaaccccccggggtt acccccccccgggtt aaaaacccgggggtt aaaaaaaggggggtt cccccccgggggggt aaaaaaacccggttt aaaaaaaaagggttt aaaaaccccccgttt aaacccccccccttt aaaacgggggggggt aaccccggggggggt aaaaccccgggggtt aaaaaacggggggtt ccccccccccgggtt aacccccccggggtt aaaccgggggggggt aaccccccccccttt aaaaaaaacgggttt aaaacccccccgttt acccccggggggggt aaaaaaccccggttt aaaaaaaaaactttt aaaaaccggggggtt accccccccggggtt aaacccccgggggtt aaaccccccccgttt aaaaaaaccgggttt aaaaggggggggggt aaaaacccccggttt aacccgggggggggt ccccccggggggggt cccccccccggggtt aaaaaagggggggtt aaccccccgggggtt aaaaaaaaacctttt aaaacccggggggtt aaaaccccccggttt aaaaaacccgggttt aacccccccccgttt aaacggggggggggt aaaaaaaaggggttt accccgggggggggt aaaaaaaaaagtttt aaaccccggggggtt aaaaaaaaccctttt acccccccgggggtt aaaaacgggggggtt aaaaaccccgggttt aaccggggggggggt accccccccccgttt aaacccccccggttt aaaaaaacggggttt ccccccccgggggtt aaaaaaacccctttt aaaaccgggggggtt aacccccggggggtt aaaaaaaaacgtttt aaaacccccgggttt aaaaaaccggggttt acccggggggggggt aaccccccccggttt aaaaaaccccctttt aaaaaggggggggtt aaaaaaaaccgtttt accccccggggggtt aaacccgggggggtt acccccccccggttt aaaaacccggggttt aaaccccccgggttt aaaaaaagggggttt ccccggggggggggt aacgggggggggggt cccccccggggggtt aaaaaaaaaggtttt aaaacggggggggtt aaaaacccccctttt aaccccgggggggtt aaaaaaacccgtttt aacccccccgggttt aaaaaacgggggttt ccccccccccggttt aaaaccccggggttt aaaaaaaacggtttt aaaccggggggggtt aaaaccccccctttt aaaaaaccccgtttt acccccgggggggtt aaaaaccgggggttt aaacccccggggttt accccccccgggttt aaaaaaaccggtttt aaacccccccctttt aaaaacccccgtttt ccccccgggggggtt aaaagggggggggtt aacccggggggggtt acggggggggggggt aaaaaaggggggttt aaaacccgggggttt cccccccccgggttt aaccccccggggttt accccggggggggtt aaccccccccctttt aaaaaaaagggtttt aaaaccccccgtttt aaaaaacccggtttt aaacgggggggggtt aaaccccgggggttt acccccccggggttt aaaaacggggggttt aaaaaaacgggtttt aaaaaccccggtttt aaccgggggggggtt cccccggggggggtt aaacccccccgtttt acccccccccctttt aaaaccggggggttt aacccccgggggttt ccccccccggggttt aaaaaaaaacttttt aaccccccccgtttt acccgggggggggtt ccccccccccctttt aaaaaaccgggtttt aaaacccccggtttt aaaggggggggggtt aaaaaaaaccttttt aaaaagggggggttt aaacccggggggttt accccccgggggttt acccccccccgtttt aacggggggggggtt ccccgggggggggtt aaaaacccgggtttt aaaaaaaggggtttt aaaccccccggtttt aaccccggggggttt cccccccgggggttt aaaaaaaaagttttt aaaacgggggggttt aaaaaaacccttttt ccccccccccgtttt accggggggggggtt aacccccccggtttt aaaaaacggggtttt aaaaccccgggtttt aaaaaaaacgttttt acccccggggggttt aaaccgggggggttt aaaaaaccccttttt accccccccggtttt aagggggggggggtt aaacccccgggtttt aaaaaccggggtttt aacccgggggggttt aaaaggggggggttt aaaaacccccttttt ccccccggggggttt aaaaaaaccgttttt aaaaaagggggtttt acgggggggggggtt aaccccccgggtttt cccccccccggtttt aaaacccggggtttt aaaaccccccttttt accccgggggggttt aaaaaaaaggttttt aaaaaacccgttttt aaacggggggggttt ccgggggggggggtt acccccccgggtttt aaaccccggggtttt aaaaacgggggtttt aaaaaccccgttttt aaacccccccttttt cccccgggggggttt aaccggggggggttt aaaaaaacggttttt aacccccggggtttt aaaaccgggggtttt ccccccccgggtttt acccggggggggttt aaaacccccgttttt aaccccccccttttt aaaaaaccggttttt aaagggggggggttt aaacccgggggtttt accccccggggtttt aaaaaggggggtttt ccccggggggggttt aaaccccccgttttt acccccccccttttt aaaaaaagggttttt aacgggggggggttt aaaaacccggttttt aaaacggggggtttt aaccccgggggtttt cccccccggggtttt aaaaccccggttttt accgggggggggttt ccccccccccttttt aaaaaacgggttttt aacccccccgttttt acccccgggggtttt aaaaaaaactttttt aaaccggggggtttt aaacccccggttttt cccgggggggggttt aaggggggggggttt accccccccgttttt aaaaaccgggttttt aaaagggggggtttt aacccggggggtttt aaaaaaacctttttt ccccccgggggtttt aaaaaaggggttttt cccccccccgttttt acggggggggggttt aaccccccggttttt aaaacccgggttttt accccggggggtttt aaacgggggggtttt aaaaaaaagtttttt aaaaaaccctttttt acccccccggttttt ccggggggggggttt aaaaacggggttttt aaaccccgggttttt aaaaaaacgtttttt aaaaacccctttttt cccccggggggtttt aaccgggggggtttt aaaaccggggttttt ccccccccggttttt aacccccgggttttt aaaggggggggtttt aaaaccccctttttt acccgggggggtttt aaaaaaccgtttttt aaaaagggggttttt cgggggggggggttt accccccgggttttt aaacccggggttttt aaacccccctttttt aacggggggggtttt aaaaacccgtttttt ccccgggggggtttt aaaaaaaggtttttt cccccccgggttttt aaccccggggttttt aaaacgggggttttt aaccccccctttttt aaaaaacggtttttt accggggggggtttt aaaaccccgtttttt aaaccgggggttttt acccccggggttttt acccccccctttttt aagggggggggtttt aaaaaccggtttttt cccggggggggtttt aaacccccgtttttt aacccgggggttttt aaaaggggggttttt ccccccggggttttt aaaaaagggtttttt aaaacccggtttttt acgggggggggtttt aaccccccgtttttt aaaaaaaattttttt aaacggggggttttt accccgggggttttt aaaccccggtttttt ccgggggggggtttt acccccccgtttttt aaaaacgggtttttt aaccggggggttttt aaaaaaacttttttt cccccgggggttttt aaaaccgggtttttt ccccccccgtttttt aggggggggggtttt aacccccggtttttt aaaaaaccttttttt acccggggggttttt aaagggggggttttt cggggggggggtttt aaaaaggggtttttt aaacccgggtttttt accccccggtttttt aacgggggggttttt ccccggggggttttt aaaaacccttttttt aaaaaaagttttttt aaccccgggtttttt cccccccggtttttt aaaacggggtttttt aaaaaacgttttttt accgggggggttttt aaaaccccttttttt acccccgggtttttt aaaccggggtttttt cccgggggggttttt aaaaaccgttttttt aaggggggggttttt aaacccccttttttt ccccccgggtttttt aacccggggtttttt aaaagggggtttttt aaaacccgttttttt aaaaaaggttttttt acggggggggttttt aaccccccttttttt accccggggtttttt aaacgggggtttttt aaaaacggttttttt ccggggggggttttt aaaccccgttttttt acccccccttttttt cccccggggtttttt aaccgggggtttttt aaaaccggttttttt agggggggggttttt ccccccccttttttt aacccccgttttttt aaaggggggtttttt acccgggggtttttt cgggggggggttttt aaacccggttttttt accccccgttttttt aaaaagggttttttt ccccgggggtttttt aacggggggtttttt aaaaaaatttttttt cccccccgttttttt aaccccggttttttt aaaacgggttttttt accggggggtttttt aaaaaactttttttt ggggggggggttttt acccccggttttttt aaaccgggttttttt cccggggggtttttt aaaaacctttttttt aagggggggtttttt aaaaggggttttttt ccccccggttttttt aacccgggttttttt aaaaaagtttttttt acgggggggtttttt aaaaccctttttttt aaacggggttttttt accccgggttttttt ccgggggggtttttt aaaaacgtttttttt aaacccctttttttt aaccggggttttttt cccccgggttttttt aaaaccgtttttttt aaccccctttttttt aggggggggtttttt acccggggttttttt aaagggggttttttt aaacccgtttttttt cggggggggtttttt acccccctttttttt aaaaaggtttttttt ccccggggttttttt aacgggggttttttt aaaacggtttttttt aaccccgtttttttt ccccccctttttttt accgggggttttttt aaaccggtttttttt gggggggggtttttt acccccgtttttttt aaggggggttttttt cccgggggttttttt aaaagggtttttttt aacccggtttttttt ccccccgtttttttt acggggggttttttt accccggtttttttt aaacgggtttttttt aaaaacttttttttt ccggggggttttttt cccccggtttttttt aaccgggtttttttt aaaaccttttttttt agggggggttttttt acccgggtttttttt aaaggggtttttttt aaaaagttttttttt aaacccttttttttt cgggggggttttttt ccccgggtttttttt aacggggtttttttt aaaacgttttttttt aaccccttttttttt accggggtttttttt aaaccgttttttttt acccccttttttttt aagggggtttttttt cccggggtttttttt aacccgttttttttt aaaaggttttttttt acgggggtttttttt aaacggttttttttt accccgttttttttt ccgggggtttttttt cccccgttttttttt aaccggttttttttt aggggggtttttttt aaagggttttttttt acccggttttttttt cggggggtttttttt aacgggttttttttt ccccggttttttttt aaaactttttttttt accgggttttttttt aaacctttttttttt gggggggtttttttt aaggggttttttttt cccgggttttttttt aaaagtttttttttt aaccctttttttttt acggggttttttttt aaacgtttttttttt acccctttttttttt ccggggttttttttt aaccgtttttttttt agggggttttttttt acccgtttttttttt aaaggtttttttttt aacggtttttttttt accggtttttttttt ggggggttttttttt aagggtttttttttt cccggtttttttttt acgggtttttttttt aaacttttttttttt ccgggtttttttttt aaccttttttttttt aggggtttttttttt cggggtttttttttt aacgttttttttttt accgttttttttttt gggggtttttttttt aaggttttttttttt ccggttttttttttt cgggttttttttttt acctttttttttttt ggggttttttttttt
What I have a problem with in your posts is the same thing now as when I wrote this back in February. You are still trotting out "potted wisdoms". Usually completely inappropriate to the subject, frequently totally misleading and incorrect, never backed by research or code, nor even citing any actual, tangible evidence.
If you trace our interactions back, you'll (re-)discover that in my earlier attempts to get you to start considering the negative affects of your posts, I was polite, giving you the benefit of the doubt -- but still you continued. As you continued, so did I. And the tone of my refutations grows in strength over time. An attempt to shock you into actually applying some thought prior to posting. It hasn't had any affect.
So, whilst you feel free to proffer such garbage, I continue to feel free, if not obliged, to take issue with it.
|
|---|