Re: Matching and combining two text files

The standard technique for looking stuff up is to use a hash:

use strict;
use warnings;

my $file1 = <<FILE1;
parcel# 12345

doc num 123
doc num 456 
doc num 789

parcel# 67890

doc num 342
doc num 657 
doc num 876
FILE1

my $file2 = <<FILE2;
doc num 342 data data data data data data data data
doc num 657 data data data data data data data data
doc num 876 data data data data data data data data
doc num 123 data data data data data data data data
doc num 456 data data data data data data data data
doc num 789 data data data data data data data data
FILE2

my %docs;
my $currParcel;

open my $f1In, '<', \$file1;
while (<$f1In>) {
    chomp;
    next if ! $_;
    
    if (/parcel#\s+(\d+)/) {
        $currParcel = $1;
        next;
    }
    
    next if ! defined $currParcel || ! /^doc num (\d+)/;
    $docs{$1} = $currParcel;
}
close $f1In;

open my $f2In, '<', \$file2;
while (<$f2In>) {
    chomp;
    next if ! /doc num\s+(\d+)\s+(.*)/;
    
    if (! exists $docs{$1}) {
        warn "Parcel not known for $1\n";
        next;
    }
    
    print "parcel# $docs{$1} doc num $1 $2\n";
}
close $f2In;
[download]

Prints:

parcel# 67890 doc num 342 data data data data data data data data
parcel# 67890 doc num 657 data data data data data data data data
parcel# 67890 doc num 876 data data data data data data data data
parcel# 12345 doc num 123 data data data data data data data data
parcel# 12345 doc num 456 data data data data data data data data
parcel# 12345 doc num 789 data data data data data data data data
[download]

However this task looks like it should really be using a database. If there are more than a few hundred entries in the files and the data is likely to be referenced more than a small number of times a database will make your life much happier (eventually).

True laziness is hard work

Comment on Re: Matching and combining two text files Select or Download Code

Replies are listed 'Best First'.

Re^2: Matching and combining two text files
by koolgirl (Hermit) on Jan 23, 2012 at 04:36 UTC

Thanks, GrandFather, I suspected as much, about the hash, but my experience is a bit limited with them, as such, I had a hard time envisioning how to match up the keys/values, although now it seems obvious. Yes, the company I'm working for is using a db, I'm actually writing the code to put it there (create a .csv out of all collected data), unfortunately in doing so, I have to deal with about a half a million records, even a small chunk of that to work on and test, is mind boggling.

Half of the time, since I began working as a Perl programmer *sniff....koolgirl's growing up...*, I feel brilliant, the other half of the time, I feel like a complete dumb a$#. I guess it evens out eventually?

[reply]


Welcome to the Monastery
	PerlMonks