Re: searching through data

Use a hash, with your numbers as keys. That way, grepping through the entire array would become a simple hash lookup.

#!/usr/bin/perl

my @array = map {int rand 1e6} 1..400000;

# create file with numbers to look up
open my $fh, ">", "in.txt" or die "$!";
for (1..1000000) {
    print $fh int rand 1e6, "\n";
}
close $fh;

my %lookup_table;
$lookup_table{$_}++ for @array;

open (in , "<", "in.txt") || die "$!";
while (<in>){
    my ($num) = m/^(\d+)/;
    print "$num, " if $lookup_table{$num};
} 
close in;

__END__
$ time ./757954.pl >out

real    0m4.141s
user    0m4.004s
sys     0m0.132s
[download]

(Memory requirement approx. 100 M — or 80 M, if you get rid of the map for the @array initialisation)

Update: with 300_000_000 rows, it takes about 15 min., which includes creating the 2 Gig random data file "in.txt" plus writing a 760 M output file. (Memory requirement is the same.)

Comment on Re: searching through data Select or Download Code

Replies are listed 'Best First'.
Re^2: searching through data by evaluator (Monk) on Apr 17, 2009 at 08:59 UTC
Instead of `my @array = map {int rand 1e6} 1..400000; ... $lookup_table{$_}++ for @array;` [download] one should use something like `my %lookup_table = map {int rand 1e6 => 1} 1..400000;` [download] in order to save memory requirements. There is no need to have the list of numbers both in an array and in the hash.	[reply] [d/l] [select]
Re^3: searching through data by almut (Canon) on Apr 17, 2009 at 09:28 UTC
There is no need to have the list of numbers both in an array and in the hash. Well, I just left in the `@array` to make it easier for the OP to see what is what... And if the idea is to reduce memory usage, you should definitely also get rid of the `map`, which would cut it down to 45 MB, as opposed to 108 MB with the `map`. (`map` creates all the elements on the stack before assigning them...) `my %lookup_table; $lookup_table{int rand 1e6}++ for 1..400000; # 45 M --- my %lookup_table = map {int rand 1e6 => 1} 1..400000; # 108 M` [download] Also, using `++` instead of assigning 1 has the added benefit of detecting duplicate numbers, should this ever be of interest...	[reply] [d/l] [select]