in reply to Re^2: merging .csv text input based on matching column field values in multiple rows
in thread merging .csv text input based on matching column field values in multiple rows
To "sort and group" you'll need three passes through the file lines: one to read in the lines, one to sort, and one to print. You'll also need to store the data from the "reading in" phase in an an array of array references.
I've included some code demonstrating the algorithm. You'll have to make adaptions to fit your situation, but it should at least show you how to use the sort-n-group algorithm. I don't have the CVS module you are using so I've done a simple chomp and split to emulate it. You'll want to substitute that part with your code to read in CVS. I've also read in sample lines from <DATA> and printed to STDOUT for convenience. It also should make it easy to just download the code and see if it does indeed do what you want.
use strict; use warnings; my $IDX_ID = 0; my $IDX_NAME = 1; my $IDX_ITEM = 2; my $IDX_DESC = 3; my $IDX_PRICE = 4; #------------------------------------------------- # read lines into an *array of an array references* #-------------------------------------------------- my @aLines; while (my $line = <DATA>) { #get rid of trailing new line chomp $line; # break into columns # Note: I've used a simple split to populate columns, # since I don't have Text::CVS_XS on my system my @aColumns = split( /\s*,\s*/, $line); #convert @aColumns to a reference and add it to #the end of the array # Note: this part is the same, no matter whether you use # split or Text::CVS_XS push @aLines, \@aColumns; } #------------------------------------------------- # sort the lines on id #-------------------------------------------------- # {...} is an anonymous subroutine used for sorting # it has two special implicit parameters $a, $b, each # representing two elements of @aLines, i.e. two different # array references. These array references are just # columns and we want to compare id's. # # Since these are references to arrays rather than arrays # we use $a->[...] instead of $a[...] my @aSorted = sort { $a->[$IDX_ID] < $b->[$IDX_ID] } @aLines; #------------------------------------------------- # now group and print #-------------------------------------------------- my $currentId; for my $aColumns (@aSorted) { my $id = $aColumns->[$IDX_ID]; my $name = $aColumns->[$IDX_NAME]; my $itemId = $aColumns->[$IDX_ITEM]; my $itemDesc = $aColumns->[$IDX_DESC]; my $price = $aColumns->[$IDX_PRICE]; #do we have the start of new order? if (!defined($currentId) || ($id ne $currentId)) { #if we are finishing an order, close it before we begin #the next one. if (defined($currentId)) { print "</order>\n"; } #set current id to new current id $currentId = $id; #print out order header XML print "<order>\n"; print " <order_id>$id</order_id>\n"; print " <name>$name</name>\n"; } #print XML for item print " <item>\n"; print " <item_id>$itemId</item_id>\n"; print " <item_desc>$itemDesc</item_desc>\n"; print " <price>$price</price>\n"; print " </item>\n"; } #close final order if (defined($currentId)) { #we are finishing an order #close it before we begin next print "</order>\n"; } __DATA__ 1, bill smith, 11, red shoes, 39.99 2, john doe, 32, black hat, 21.59 3, jane lee, 12, green shoes, 29.99 2, john doe, 11, red shoes, 39.99
Best, beth
|
|---|