$ time perl foo.pl
real 0m9.204s
user 0m3.697s
sys 0m1.387s
$ cat foo.pl
#!/usr/bin/perl -w
use strict;
use warnings;
open F1, 'sort s1|' or die "opening file 1";
open F2, 'sort s2|' or die "opening file 2";
open OU1, '>', 'uniq.1' or die;
open OU2, '>', 'uniq.2' or die;
open OU3, '>', 'common' or die;
# Prime the pump
my $in1 = <F1>;
my $in2 = <F2>;
while (1) {
last if !defined($in1) and !defined($in2);
if (!defined($in1)) {
# File 1 is empty, rest of File 2 is unique
print OU2 $in2;
$in2 = <F2>;
}
elsif (!defined($in2)) {
# File 2 is empty, rest of File 1 is unique
print OU1 $in1;
$in1 = <F1>;
}
elsif ($in1 eq $in2) {
# Line common to both
print OU3 $in1;
$in1 = <F1>;
$in2 = <F2>;
}
elsif ($in1 lt $in2) {
# Line unique to File 1
print OU1 $in1;
$in1 = <F1>;
}
else {
# Line unique to File 2
print OU2 $in2;
$in2 = <F2>;
}
}
I generated the two test files like so:
$ time perl gen_random_strings.pl 2000000 60 70 ABC >s1
real 0m45.910s
user 0m44.647s
sys 0m1.091s
$ time perl gen_random_strings.pl 2000000 60 70 ABC >s2
real 0m45.989s
user 0m44.475s
sys 0m1.138s
Using a quickie random-string generator:
#!/usr/bin/perl
#
# gen_random_strings.pl <num_strings> <min_length> <max_length> <al
+phabet>
#
my $num_strings = shift;
my $min_length = shift;
my $max_length = shift;
my $alphabet = shift or die "Missing argument(s)!";
my $alphabet_len = length($alphabet);
while ($num_strings--) {
my $num_chars = int(rand($max_length-$min_length+1))+$min_length;
my $string = '';
$string .= substr($alphabet, int(rand($alphabet_len)), 1)
for 1 .. $num_chars;
print $string, "\n";
}
Notes:
[1] It took only a few seconds because the data was still in RAM from having been generated. It would've only taken a few minutes otherwise, as you can tell from the time it took to generate the files.
[2] This code is a slight hack of a merge sort I illustrated some time ago.
...roboticus
When your only tool is a hammer, all problems look like your thumb. |