how to merge similar data

alexiskb has asked for the wisdom of the Perl Monks concerning the following question:

Sorry for the newbie question...
2 files with a common field.
I thought i could just put all the rows into a hash for both tables then just look a hash lookup...
but i seem to be getting memory problems, the process works fine, then just stalls after 1000 records...
May I ask for pointers what i should be looking at doing, ie should i use arrays? hashes? - i dont want to use modules..
thank you!
here is what ive got:

TGR|10 GROUP|www.10group.co.uk#http://www.10group.co.uk#|0121 333 5464
+|johnj beck|info@10group.co.uk|
SVG|7 GROUP|www.7group.com|0121 233 1122|tim rice|tim@7.com|
[download]

etc... for 4000 records joined to: on field[0]

TGR|10 GROUP|10 GROUP PLC|GB|54|0.40|0.045|200000|GBX|
SVG|7 GROUP|7 GROUP PLC ORD|GB|63|1.00|0.35|0.550|5000|GBX|
[download]

etc... for 4000 records
and here is my poor excuse for baby perl:


### do original data

open(COMPANIES, "+< ./data1") or die "can't open file: $!";
    $co = 0;
while (<COMPANIES>) {
    $contents[$co] = $_;
    $co++;
    }
    
    foreach $record (@contents)  {
    
    @fields = split(/\|/,$record);
    $tidm=$fields[0];
    etc...


$lse{$tidm}++;    
    
}
close(COMPANIES);


### do data to be merged


open(INFO, "+< ./info.csv") or die "can't open file: $!";
    $co2 = 0;
while (<INFO>) {
    $contents2[$co2] = $_;
    $co2++;
    }

    foreach $record2 (@contents2)  {
    @fields2 = split(/\|/,$record2);
    $tidm_inf=$fields2[0];
    etc...
    
    $merge{$tidm_inf}="$record2";    
    
    }
    
close(INFO);


### print merged


foreach my $k (sort keys %lse) {
    print "$merge{$k}\n"; 
}
[download]

Comment on how to merge similar data Select or Download Code

Replies are listed 'Best First'.
Re: how to merge similar data by katgirl (Hermit) on Sep 19, 2002 at 10:09 UTC
Here we go: (you may find a better way later) #!/usr/bin/perl ### do original data open(COMPANIES, "+< ./data1") or die "can't open file: $!"; @contents = <COMPANIES>; close(COMPANIES); chomp(@contents); open(INFO, "+< ./info.csv") or die "can't open file: $!"; @contents2 = <INFO>; close(INFO); chomp(@contents2); ### do data to be merged $merges = 0; foreach $record (@contents) { @fields = split(/\\|/,$record); $tidm=$fields[0]; foreach $record2 (@contents2) { @fields2 = split(/\\|/,$record2); $tidm_inf=$fields2[0]; if ($tidm eq $tidm_inf){ $merged[$merges] = join("\|",(@fields, @fields2[2..8])); $merges++; } } } ### print merged open(FILE,">outfile.dat"); foreach $merge(@merged){ print "$merge\n"; } close(FILE); [download]	[reply] [d/l]
Re: Re: how to merge similar data by alexiskb (Acolyte) on Sep 19, 2002 at 10:38 UTC
Perfect, thanks katgirl, that works perfectly! i shall study your code well. thanks very much indeed.	[reply]
Re: how to merge similar data by kabel (Chaplain) on Sep 19, 2002 at 09:30 UTC
some suggestions: if it is similar data then write a function which takes two arguments: a hash reference where you store the data and a file name from which the data comes. does the second field belong to the company name or is it standalone in both files?	[reply]