in reply to searching through data
Use a hash, with your numbers as keys. That way, grepping through the entire array would become a simple hash lookup.
#!/usr/bin/perl my @array = map {int rand 1e6} 1..400000; # create file with numbers to look up open my $fh, ">", "in.txt" or die "$!"; for (1..1000000) { print $fh int rand 1e6, "\n"; } close $fh; my %lookup_table; $lookup_table{$_}++ for @array; open (in , "<", "in.txt") || die "$!"; while (<in>){ my ($num) = m/^(\d+)/; print "$num, " if $lookup_table{$num}; } close in; __END__ $ time ./757954.pl >out real 0m4.141s user 0m4.004s sys 0m0.132s
(Memory requirement approx. 100 M — or 80 M, if you get rid of the map for the @array initialisation)
Update: with 300_000_000 rows, it takes about 15 min., which includes creating the 2 Gig random data file "in.txt" plus writing a 760 M output file. (Memory requirement is the same.)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: searching through data
by evaluator (Monk) on Apr 17, 2009 at 08:59 UTC | |
by almut (Canon) on Apr 17, 2009 at 09:28 UTC |