shaezi has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering if I could get some input on doing this a better way. I have two files.
File One has data that looks like:

012323444, 4.5 Nuts & Bolts , 234334323, Fuzzy Nut bunnies , 434454554, Pineapple Pops ,
and so on........................

File Two has data that looks like:

3423434444, 45554342365, 012323444, 00000333, description 2 update1, d +ata1, data2 4342344, 46565fg66, 234334323, 00004454, update this, data4, data6 523444234, 466763fg4, 434454554, 00005565, this too update, data7, dat +a8

this is going to be a little hard to explain but here it goes. In file 2 I'm trying to update the descriptions (item 5 on each line) using
the descriptions in file 1. The numbers in column one of file 1 will be used as a key
to match numbers in column 3 of file 2, and then using the description in file1 replace the
the description in file 2 (which is column 5). I thought the best way to do this is:

#open file1, split each line delimited by comma and read into %hash #assume $code, and $description are the key/value pairs of the %hash. #open file2 and read into array while <file2> { my @array2 = <file2>; } #close file2 #open file2 foreach $item (@array2) { @values = split /,/; $values[4] = $hash{$values[2]}; # this should now put the new descrip +tion from the hash print file2 $values[1], ", ", $values[2], ", ", $values[3], ", ", $va +lues[4], ", ", $values[5], ", ", "\n"; # writing the line back into file2 }

I don't know if this is was the cheeziest way to do this. But I'm sure there is a better way to do this
I'm still a newbie to PERL so I don't know if I can do this with a regex.
In a nutshell I begin with file2 read the code on column 3 then match the code in
file 1 and then get the description in file 1 and come back to file2 and update the description
on column 5.

Edit ar0n 2001-07-15 -- Fixed formatting

Replies are listed 'Best First'.
Re: a hash/array problem and can contents of an array be printed as one line
by synapse0 (Pilgrim) on Jul 15, 2001 at 13:59 UTC
    Well, there's no reason to read both files into mem.. you can just process/replace the items in file 2 as you read it..
    for example
    open(FILE, "<$file1") || die "oh the misery, $!"; while(<FILE>) { chomp; # making the assumption that each field # is delimited by comma space, safe? ($id, $desc) = split(/, /); $update_hash{$id} = $desc; } close(FILE); #create backup rename $file2, "$file2.bak" || die "bah, skum $!"; open(INFILE, "<$file2.bak") || die "dagnabit! $!"; open(OUTFILE, ">$file2") || die "ookiemouth! $!"; while(<INFILE>) { chomp; # once again, assuming data is delimited # by comma space @vals = split(/, /); # update if a new description exists $vals[4] = $update_hash{$vals[2]} if $update_hash{$vals[2]}; # write file delimited by comma space (even last item) foreach $data (@vals) { print OUTFILE "$data, "; } print OUTFILE "\n"; } close(INFILE); close(OUTFILE);
    Not a whole lot changed, just some extra stuff, including more validation..
    -Syn0
      If everytime you open the file you split by /, / and then print using:
      # write file delimited by comma space (even last item) foreach $data (@vals) { print OUTFILE "$data, "; } print OUTFILE "\n";
      You will end up with a file with lots of trailing commas.
      Instead of the loop, you can use:
      print OUTFILE join(', ', @vals) . "\n";

      Hope this helps
      Aziz,,,

        The original post had the commas (explicitly put in there by the code he wrote) at the end of the lines... Instead of changing his file formats, I just used the format that was stated.
        -Syn0
Re: a hash/array problem and can contents of an array be printed as one line
by mattr (Curate) on Jul 15, 2001 at 14:35 UTC
    I'd add another level of paranoia and make a backup to yet a different name because if you run this twice with a bug then you've overwritten your good backup.

    Assuming you can ensure that this is good CSV format you can also attack these files as if they were SQL databases by using DBI and DBD::CSV or DBD::RAM. If the files are really long DBD::RAM may be the best because it looks like it can refrain from reading the whole file into memory. But for not so huge files the above strategy is probably fastest.

    You probably want to do some tests or just monitor the output on the screen, in case you have bad data (maybe embedded commas?) which could throw off the routine.