comment on

I'm trying to solve a problem derived from database handling, which is when you have to maintain two remote copies of something in sync, and somehow they went out of sync, and therefore differ. I dumped the tables to text (comma separated format), made sure the records were in ascending order of their primary keys, and wrote these scripts to obtain two files, one with the missing records of copy "two", and the other with the missing records of copy "one". I'd like some commentary on the algorithm used, its correctness, other ideas, whatever comes off the top of your head.

#!/usr/bin/perl 
my ($prefijo, $n1, $n2)=@ARGV;
my ($endOfFile1,$endOfFile2)=(0,0);

open FILE1, "<".("0"x(8-length($n1)))."$n1"."\/$prefijo".".txt";
open FILE2, "<".("0"x(8-length($n2)))."$n2"."\/$prefijo".".txt";
open OUTPUTF1, ">$prefijo$n1"."NoEn$n2";
open OUTPUTF2, ">$prefijo$n2"."NoEn$n1";
open OUTPUTR1, ">repetidos\-$prefijo$n1"."NoEn$n2";
open OUTPUTR2, ">repetidos\-$prefijo$n2"."NoEn$n1";

my $recordf1=<FILE1> or $endOfFile1=1;
my $recordf2=<FILE2> or $endOfFile2=1;
my $key1, $key2;
my $firstpass=1;
my $prevkey1, $prevkey2;

while ( !$endOfFile1 && !$endOfFile2 ) {

  if ($recordf1=~/^ *([0-9]+)\,/) {
    $prevkey1=$key1;
    $key1=$1;
  } else {
    undef $key1;
  }

  if ($recordf2=~/^ *([0-9]+)\,/) {
    $prevkey2=$key2;
    $key2=$1;
  } else {
    undef $key2;
  }

  if ( $key1 < $key2 ) {
  
    if (($key1 eq $prevkey1) && !$firstpass)
      { print OUTPUTR1 $recordf1;}
    else
      { print OUTPUTF1 $recordf1;}
    
    $recordf1=<FILE1> or $endOfFile1=1;
 
  } elsif ( $key1 > $key2 ) {
 
    if (($key2 eq $prevkey2) && !$firstpass)
      { print OUTPUTR2 $recordf2;}
    else
      { print OUTPUTF2 $recordf2;}
 
    $recordf2=<FILE2> or $endOfFile2=1;
 
  } else {
    $recordf1=<FILE1> or $endOfFile1=1;
    $recordf2=<FILE2> or $endOfFile2=1;
  }
  $firstpass=0 if ($firstpass);
}

while ( !$endOfFile1 ) {
  if ($recordf1=~/^ *([0-9]+)\,/) {
    $prevkey1=$key1;
    $key1=$1;
  } else {
    undef $key1;
  }

  if (($key1 eq $prevkey1) && !$firstpass)
    { print OUTPUTR1 $recordf1;}
  else
    { print OUTPUTF1 $recordf1;}
 
  $recordf1=<FILE1> or $endOfFile1=1;
  $firstpass=0 if ($firstpass);
}

while ( !$endOfFile2 ) {
  if ($recordf2=~/^ *([0-9]+)\,/) {
    $prevkey2=$key2;
    $key2=$1;
  } else {
    undef $key2;
  }

  if (($key2 eq $prevkey2) && !$firstpass)
    { print OUTPUTR2 $recordf2;}
  else
    { print OUTPUTF2 $recordf2;}
    
  $recordf2=<FILE2> or $endOfFile2=1;
  $firstpass=0 if ($firstpass);
}

close FILE1;
close FILE2;
close OUTPUTF1;
close OUTPUTF2;
close OUTPUTR1;
close OUTPUTR2;
[download]

This first script uses one-field key comparison. It also saves repeated key records (from the second repeated key on) on a separate file. The second script, which for the sake of tidiness I'll post in a message under this thread, makes the two-field key comparison.

In reply to Key-based diffs by haroldo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.