comment on

That's far too long! It took only a few seconds^[1] on my machine:

$ time perl foo.pl

real    0m9.204s
user    0m3.697s
sys     0m1.387s
$ cat foo.pl
#!/usr/bin/perl -w
use strict;
use warnings;

open F1, 'sort s1|' or die "opening file 1";
open F2, 'sort s2|' or die "opening file 2";

open OU1, '>', 'uniq.1' or die;
open OU2, '>', 'uniq.2' or die;
open OU3, '>', 'common' or die;

# Prime the pump
my $in1 = <F1>;
my $in2 = <F2>;

while (1) {
        last if !defined($in1) and !defined($in2);

        if (!defined($in1)) {
            # File 1 is empty, rest of File 2 is unique
            print OU2 $in2;
            $in2 = <F2>;
        }
        elsif (!defined($in2)) {
            # File 2 is empty, rest of File 1 is unique
            print OU1 $in1;
            $in1 = <F1>;
        }
        elsif ($in1 eq $in2) {
            # Line common to both
            print OU3 $in1;
            $in1 = <F1>;
            $in2 = <F2>;
        }
        elsif ($in1 lt $in2) {
            # Line unique to File 1
            print OU1 $in1;
            $in1 = <F1>;
        }
        else {
            # Line unique to File 2
            print OU2 $in2;
            $in2 = <F2>;
        }
}
[download]

I generated the two test files like so:

$ time perl gen_random_strings.pl 2000000 60 70 ABC >s1

real    0m45.910s
user    0m44.647s
sys     0m1.091s
$ time perl gen_random_strings.pl 2000000 60 70 ABC >s2

real    0m45.989s
user    0m44.475s
sys     0m1.138s
[download]

Using a quickie random-string generator:

#!/usr/bin/perl
#
#   gen_random_strings.pl  <num_strings> <min_length> <max_length> <al
+phabet>
#
my $num_strings = shift;
my $min_length  = shift;
my $max_length  = shift;
my $alphabet    = shift or die "Missing argument(s)!";

my $alphabet_len = length($alphabet);

while ($num_strings--) {
    my $num_chars = int(rand($max_length-$min_length+1))+$min_length;
    my $string = '';
    $string .= substr($alphabet, int(rand($alphabet_len)), 1) 
                  for 1 .. $num_chars;
    print $string, "\n";
}
[download]

Notes:

[1] It took only a few seconds because the data was still in RAM from having been generated. It would've only taken a few minutes otherwise, as you can tell from the time it took to generate the files.

[2] This code is a slight hack of a merge sort I illustrated some time ago.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

In reply to Re: Compare 2 very large arrays of long strings as elements by roboticus
in thread Compare 2 very large arrays of long strings as elements by onlyIDleft

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.