Re: Comparing strings (exact matches) in LARGE numbers FAST

Frankly I'd tackle this by sorting each file using the Unix sort utility and then scanning in parallel merge to filter them.

The scanning logic would look like this (untested):

#! /usr/bin/perl -w
use strict;

unless (2 == @ARGV) {
  die "Usage: $0 file1 file2";
}

open(FILE1, "<", $ARGV[0]) or die "Can't read '$ARGV[0]': $!";
open(FILE2, "<", $ARGV[1]) or die "Can't read '$ARGV[1]': $!";

my $string1 = <FILE1>;
my $string2 = <FILE2>;

while (not $is_end) {
  if ($string1 lt $string2) {
    $string1 = <FILE1> or last;
  }
  elsif ($string1 gt $string2) {
    $string2 = <FILE2> or last;
  }
  else {
    print $string1;
    $string1 = <FILE1> or last;
    $string2 = <FILE2> or last;
  }
}
[download]

Comment on Re: Comparing strings (exact matches) in LARGE numbers FAST Download Code

Replies are listed 'Best First'.
Re^2: Comparing strings (exact matches) in LARGE numbers FAST by runrig (Abbot) on Aug 29, 2008 at 00:24 UTC
After sorting each file, you can just use the unix "join" command.	[reply]
Re^3: Comparing strings (exact matches) in LARGE numbers FAST by tilly (Archbishop) on Aug 29, 2008 at 11:37 UTC
Ah. Good point. However usually when I use this technique, I'm doing a little more logic so I can't use that. (And I'm usually dealing with a dataset that didn't fit in the database I was using, so I can't use that approach either.)	[reply]
Re^2: Comparing strings (exact matches) in LARGE numbers FAST by perlSD (Novice) on Aug 29, 2008 at 00:15 UTC
Thanks! That's pretty neat! I'll try that. Although the sorting may take a few minutes, at least on my Windows machine (I have Unix-like tools I can use from the command line)....	[reply]