in reply to merging .csv text input based on matching column field values in multiple rows

The basic algorithm is "sort and group":
  1. Read in your CVS file
  2. Sort the rows on id using sort - it is very flexible.
  3. Define a variable $currentId to store the current id. Initialize it as undef or a bogus value to insure that the first id read triggers the generation of XML for an order header.
  4. Loop through your records
    1. clean up bad data
    2. if id is the same as $currentId, generate the XML for an item
    3. if id is different, reset $currentId and print the XML for the order header.

Best, beth

Replies are listed 'Best First'.
Re^2: merging .csv text input based on matching column field values in multiple rows
by abrg (Initiate) on Feb 25, 2009 at 18:40 UTC
    beth, countzero, wol thanks for your help/encouragement.

    before i go much further, i should post the actual working code i currently have which has been modified to fit the input/output scenario above.

    #!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new(); my $file = ''; if (defined $ARGV[0]) { $file = $ARGV[0]; } open(my $data, '<', $file) or die "Could not open '$file'\n"; while (my $line = <$data>) { if ($csv->parse($line)) { my @columns = $csv->fields(); ## set vars my $orderNumber = $columns[0]; my $name = $columns[1]; my $itemId = $columns[2]; my $itemDesc = $columns[3]; my $price = $columns[4]; open (OUTFILE, "> output/$orderNumber.xml") or die $! . " can' +t open the file\n"; #print OUTFILE <<XML; print OUTFILE "<order>\n"; print OUTFILE " <order_id>$orderNumber</order_id>\n"; print OUTFILE " <name>$name</name>\n"; print OUTFILE " <item>\n"; print OUTFILE " <item_id>$itemId</item_id>\n"; print OUTFILE " <item_desc>$itemDesc</item_desc>\n"; print OUTFILE " <price>$price</price>\n"; print OUTFILE " </item>\n"; print OUTFILE "</order>\n"; #XML close (OUTFILE); } else { warn "Line could not be parsed:\n"; } }
    i've read the sort documentation but i'm still having trouble wrapping my brain around how to actually execute it with the Text::CSV module (forgive me, i realize this is a pretty basic concept). the File::Sort module seems interesting, though, i feel i should be able to solve this natively in PERL (also encountering some strange issues when trying to run this module). essentially, how do i get at the file array and then look inside it to the first column's value on each row?

    it would appear that my input file is being defined in scalar context where it should be an array. am i getting warmer? ;)

    thanks again
      Showing your code was definitely a good move. Now I have a better idea of why you are finding how to use sort so confusing.

      To "sort and group" you'll need three passes through the file lines: one to read in the lines, one to sort, and one to print. You'll also need to store the data from the "reading in" phase in an an array of array references.

      I've included some code demonstrating the algorithm. You'll have to make adaptions to fit your situation, but it should at least show you how to use the sort-n-group algorithm. I don't have the CVS module you are using so I've done a simple chomp and split to emulate it. You'll want to substitute that part with your code to read in CVS. I've also read in sample lines from <DATA> and printed to STDOUT for convenience. It also should make it easy to just download the code and see if it does indeed do what you want.

      Best, beth

      The "sort and group" principle shown in other nodes is sound, but it makes most sense when you're only looking at arrays.

      Here's an implementation that uses both arrays and hashes, to make an internal perl representation of the data that corresponds more closely (compared to a just arrays) to what you need to output. The output loop therefore becomes a bit simpler.

      --
      use JAPH;
      print JAPH::asString();