Re^2: searching through data

getting the job done in less than a half of second

While this is certainly a neat approach, and much better memory-wise (as long as you can keep the maximum possible number within limits...), it isn't actually faster than using a hash. The slightly modified code (to make it comparable with the hash version I suggested) takes about the same running time on my system. For example, with 1000_000 values to look up:

$ time ./757954_bytevector.pl >out

real    0m4.421s
user    0m4.368s
sys     0m0.048s

----------

#!/usr/bin/perl

# create file with numbers to look up
open (my $fh, ">", "in.txt") || die "$!";
for (1..1000000) {
    print $fh int rand 1e6, "\n";
}
close $fh;

my $ids = 1000000;  # last id
my $bin=0;
substr($bin,$_,1,pack ("c",0)) for (0..$ids);

# Create the index
for (1..400000) {
    my $id = int rand 1e6;
    substr($bin, $id,1,pack ("c",1));
}

# Search $ids
open my $fh2, "<", "in.txt" or die $!;
while (<$fh2>){
    my ($num) = m/^(\d+)/;
    print "$num, " if ((unpack "c",substr ($bin,$num,1)) == 1);
}
[download]

Comment on Re^2: searching through data Download Code