But when a sensible algorithm can locate 1 million bits close to the end of a 1/4 billion bits in 0.25 seconds, who cares if you have to "pedantically growl" ..... And anything that takes longer than 1/4 of a second is pointless.
Out of curiosity, I tested the gmp library's capability with the above scenario - under the assumption that "1/4 billion" is "250 million".
The haystack was pseudo-randomly generated, and the needle starts 19 bits from the left hand end of the haystack.
I used a combination of Math::GMPz and Inline::C.
It took 3 seconds to find the needle in the haystack (starting from the right hand end of the haystack) on Windows 7 64 bit.
In some parts of the world "1/4 billion" is "25 million" and the program then, of course, finds the needle 10 times quicker.
I'm not at all impressed with the code that I wrote in order to check this. But I was impressed that a general usage library could provide such a speedy implementation of the approach.
Unfortunately, it's not quick enough to be pointful ;-)
use strict;
use warnings;
use Math::GMPz qw(:mpz);
use Time::HiRes;
use Inline C => Config =>
USING => 'ParseRegExp',
BUILD_NOISY => 1,
INC => '-IC:/MinGW/msys/1.0/local/include',
LIBS => '-LC:/MinGW/msys/1.0/local/lib -lgmp',
TYPEMAPS => './typemap', # ships with Math::GMPz
;
use Inline C => <<'EOC';
#include <gmp.h>
int start(mpz_t * haystack, int haystack_size,
mpz_t * needle, int needle_size) {
int h_bit, n_bit, nless1 = needle_size -1;
for(h_bit = nless1; h_bit < haystack_size; h_bit++) {
for(n_bit = 0; n_bit < needle_size; n_bit++) {
if(mpz_tstbit(*haystack, h_bit - needle_size + n_bit) !=
mpz_tstbit(needle, n_bit)) break;
if(n_bit == nless1) return h_bit;
}
}
return 0;
}
EOC
my ($t1, $t2, $match);
my ($haystack_size, $needle_size) = (250e6, 1e6);
# generate the haystack - takes a while
my $h_string = rstring($haystack_size);
# generate a needle that's guaranteed to match , starting
# at 19 bits from the lh end - ie at bit 249999981
my $n_string = substr($h_string, 19, $needle_size);
# create the haystack gmp vector. (We need an additional
# leading "1" bit to ensure that leading zero bits aren't ignored.)
my $haystack = Math::GMPz->new( "1" . $h_string , 2);
# create the needle gmp vector. (We need an additional
# leading "1" bit to ensure that leading zero bits aren't ignored.)
my $needle = Math::GMPz->new("1" . $n_string, 2);
# Check that we've got the right number of bits.
print Rmpz_sizeinbase($haystack, 2), " ",
Rmpz_sizeinbase($needle, 2), "\n";
$t1 = Time::HiRes::time();
$match = start($haystack, $haystack_size, $needle, $needle_size);
$t2 = Time::HiRes::time();
print $match, " ", $t2 - $t1, "\n";
sub rstring {
die "haystack size must be a multiple of 10"
if $_[0] % 10;
my $ret;
my $its = $_[0] / 10;
$ret .= sprintf("%010b",int rand 1024) for 1 .. $its;
if(length($ret) != $_[0] || $ret =~ /[^01]/) {
die "Error in sub rstring";
}
return $ret;
}
# Outputs:
# 250000001 1000001
# 249999981 3.16680598258972
Cheers, Rob | [reply] [d/l] |
under the assumption that "1/4 billion" is "250 million".
You picked out a quote from a reply to a "non-combatant"; someone for whom the differences between 'GB', 'Gb' and 'GiB' would be lost.
For the sake of precision, "1/4 billion bits", in this case is, 1024**3/4 == 268435456 bits == 33554432 bytes == 32 MiB.
BTW: searching from the end is an intriguing possibility...but gains play losses if the needle is to be found at the beginning (left-most; least-most; lower) position in the haystack.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] |