in reply to Comparing strings (exact matches) in LARGE numbers FAST

Frankly I'd tackle this by sorting each file using the Unix sort utility and then scanning in parallel merge to filter them.

The scanning logic would look like this (untested):

#! /usr/bin/perl -w use strict; unless (2 == @ARGV) { die "Usage: $0 file1 file2"; } open(FILE1, "<", $ARGV[0]) or die "Can't read '$ARGV[0]': $!"; open(FILE2, "<", $ARGV[1]) or die "Can't read '$ARGV[1]': $!"; my $string1 = <FILE1>; my $string2 = <FILE2>; while (not $is_end) { if ($string1 lt $string2) { $string1 = <FILE1> or last; } elsif ($string1 gt $string2) { $string2 = <FILE2> or last; } else { print $string1; $string1 = <FILE1> or last; $string2 = <FILE2> or last; } }

Replies are listed 'Best First'.
Re^2: Comparing strings (exact matches) in LARGE numbers FAST
by runrig (Abbot) on Aug 29, 2008 at 00:24 UTC
    After sorting each file, you can just use the unix "join" command.
      Ah. Good point. However usually when I use this technique, I'm doing a little more logic so I can't use that. (And I'm usually dealing with a dataset that didn't fit in the database I was using, so I can't use that approach either.)
Re^2: Comparing strings (exact matches) in LARGE numbers FAST
by perlSD (Novice) on Aug 29, 2008 at 00:15 UTC
    Thanks! That's pretty neat! I'll try that. Although the sorting may take a few minutes, at least on my Windows machine (I have Unix-like tools I can use from the command line)....