Merging files

ellem has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Merging files by sevensven (Pilgrim) on Oct 26, 2001 at 20:14 UTC
Here is my no code sugestion, since you seam to be needing more ideas than working code ;-) You could make a script that acepts who file names as command line parameters. Then proced to parse the first file, in blocks of two lines a piece, puting the info into a hash with the person's name as the hash key. Next, parse the second file, and for each name, look it up in the hash key. If it's there, add the second file's info to the hash with that key and put a flag in there marking it as "ok", otherwise print the second files parsed info into a mismatches file. The final thing is run the hash, and for each key that doesn't have the ok flag, put it's info into the mismatches file. Hope this helps, altough it would not be terribly eficiente for large files, because it builds a hash linearly as big as the first files size.	[reply]
Re: Merging files by Fletch (Bishop) on Oct 26, 2001 at 23:40 UTC
use Text::CSV_XS (); my %data; parse_ssn_file( "file1.txt", \%data ); parse_salary_file( "file2.txt", \%data ); my $csv = Text::CSV_XS->new(); foreach my $cur ( sort keys %data ) { $csv->combine( $cur, @{ $data{$cur} }{qw( ssn file salary zip )} ); print $csv->string, "\n" } exit 0; sub parse_ssn_file { my( $file, $data ) = @_; local( CUR ); open( CUR, $file ) or die "Can't open $file: $!\n"; scalar <CUR>; # discard header my $csv = Text::CSV_XS->new( ); while( <CUR> ) { $csv->parse( $_ ); my @cur = $csv->fields(); $data->{ $cur[0] }->{ssn} = $cur[1]; $data->{ $cur[0] }->{file} = $cur[2]; } return } sub parse_salary_file { my( $file, $data ) = @_; local( CUR ); open( CUR, $file ) or die "Can't open $file: $!\n"; scalar <CUR>; # discard header my $csv = Text::CSV_XS->new( ); while( <CUR> ) { $csv->parse( $_ ); my @cur = $csv->fields(); $data->{ $cur[0] }->{salary} = $cur[1]; $data->{ $cur[0] }->{zip} = $cur[2]; } return } __END__ [download] `file1.txt Name,SSN,File "Doe, John",123456789,3447 "Flinstone, Fred",000000302,1234 file2.txt Name,Salary,ZIP "Flinstone, Fred",34500,90210 "Doe, John",50000,10003 Output: "Doe, John",123456789,3447,50000,10003 "Flinstone, Fred",000000302,1234,34500,90210` [download] Update: D'oh, might help if I answered the whole question. :/ If you want to catch exceptions then you'd want to pick whatever file is going to be canonical and parse it first. Have all the other `parse_`foo`_file` routines first do an `exists $data->{$cur}` to check whether it's a valid, existing name before inserting the data. If it's not, have an open log file (`open( REJ, ">$filename.rej" ) or die $!`) that you can print the rejected entries into.	[reply] [d/l] [select]
Re: Re: Merging files by ellem (Hermit) on Oct 27, 2001 at 04:07 UTC
Thanks! This really did clear up a lot of what I was doing wrong. Two questions: 1 - Although you commented it I don't understand what `scalar <CUR>;` does. 2 - Instead of using Names as the key you're using two columns? What would you suggest for files with a greater number of columns? Could I just continue on with `$cur[3],[4]` etc.? -- lmoran@wtsgSPAM.com print "\x{263a}"	[reply] [d/l] [select]
Re: Re: Re: Merging files by Fletch (Bishop) on Oct 27, 2001 at 04:33 UTC
As for number one, that's just throwing away the first line. It's just the normal readline diamond operator in void context (technically the `scalar` is superfluous, but I put it in out of habit). The return value is being discarded rather than being saved anywhere (`my $toss = <CUR>` would have done just the same thing with a (useless) temporary). As for the data structure, it's a hash of hashes. The top level hash (`%data`) has keys that are the primary key from your data file (the names), and the corresponding value is a hash of key/value pairs from the various data files. Add something like the following to the end of the program and you can see what it looks like: `... use Data::Dumper qw( Dumper ); print Dumper( \%data );` [download] See also `perldoc perldsc`.	[reply] [d/l]
Re: Re: Re: Re: Merging files by ellem (Hermit) on Oct 27, 2001 at 05:16 UTC
Re: Merging files by DamnDirtyApe (Curate) on Oct 26, 2001 at 21:27 UTC
If you want a bit of conceptual guidance, you may find something useful in Data Munging with Perl. Unfortunately, as I post this, the link to the two sample chapters isn't working. Check back here later, though; there's some good advice about the kind of thing you're talking about, along with some code examples. _______________ D a m n D i r t y A p e Home Node \| Email	[reply]