Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

How do I sort a CSV file on multiple columns?

by mmittiga17 (Scribe)
on Dec 09, 2009 at 18:33 UTC ( [id://811997]=perlquestion: print w/replies, xml ) Need Help??

mmittiga17 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I have been trying to find with in Text::CSV a way to sort on 1 column 1 first and then by 44th column. Reading through CPAN module page and googling I can not seem to find a way to sort on multiple fields using Text::CSV. can someone point me in the right direction? Thanks !

here is what I have so far: while( <IN2> ) { chomp; $count++; # skip header next unless $count; my $row; @$row = split( /,/, $_ ); push @$sheet, $row; } foreach my $row ( sort { $a->[0] cmp $b->[0] || $a->[44] cmp $b->[44] +} @$sheet ) { print OUT join( ',', @$row ), "\n"; }

problem is the 45th field can 1 or 001 2 or 002. I need to strip the preceding 00 and then sort. for some reason I notice that in field 45 after sorting it can look like: 1 10 11 17 2 4 5

Replies are listed 'Best First'.
Re: How do I sort a CSV file on multiple columns?
by kennethk (Abbot) on Dec 09, 2009 at 18:42 UTC
    Text::CSV is a module for writing and reading CSV data. If you want to sort the results, you have two choices in my mind (and I'm sure a host of others):
    1. You can slurp the file into some data structure - probably a list of lists (perllol) - and then sort the results, a la:

      #!/usr/bin/perl use strict; use warnings; my @lists = ([0,7,0], [1,3,0], [0,5,1], [1,2,1]); foreach my $list (sort {$a->[0] <=> $b->[0] or $a->[2] <=> $b->[2] } @ +lists) { print @$list, "\n"; }
    2. You can use a module that has the support you require. I like this idea better, using DBI and DBD::CSV. An example can be found at DBD::CSV.
Re: How do I sort a CSV file on multiple columns?
by GrandFather (Saint) on Dec 09, 2009 at 22:30 UTC

    Following up on the DBD::CSV suggestions by others this may get you started:

    use strict; use warnings; use DBI; my $tmpFileName = 'temp'; open my $outFile, '>', $tmpFileName or die "Can't open $tmpFileName: $ +!\n"; print $outFile $_ for <DATA>; close $outFile; my $dbh = DBI->connect ("DBI:CSV:") or die "DBI connect failed: $DBI::errstr"; my $sth = $dbh->prepare ("SELECT * FROM temp ORDER BY Name, Score"); $sth->execute(); while (my $row = $sth->fetchrow_hashref) { print "@{$row}{'Name', 'Score', 'State'}\n"; } $sth->finish(); $dbh->disconnect(); __DATA__ Name,Score,State "001","67","CA" "2","67","CA" "12","63","FL" "1","72","IL" "1","32","AZ"

    Prints:

    1 32 AZ 001 67 CA 1 72 IL 2 67 CA 12 63 FL

    True laziness is hard work

      I'd like to note (and obviously promote) the use of new syntax in DBD::CSV available now:

      my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { PrintError => 1, RaiseError => 1, f_dir => ".", f_ext => ".csv/r", f_schema => undef, csv_null => 1, csv_auto_diag => 1, });

      Using f_ext like that enables you to mix code (.pl-files) and data (.csv-files) in a single directory, and still be able to only see the tables (without the .csv extension).

      merijn@tux:~> cat xx.pl #!/pro/bin/perl use strict; use warnings; use DBI; my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { RaiseError => 1, PrintError => 1, f_dir => ".", f_ext => ".csv/r", f_schema => undef, csv_null => 1, csv_auto_diag => 1, }); my $sth = $dbh->prepare ("select b, e from xx order by d, f"); $sth->execute; while (my @row = $sth->fetchrow_array) { print "@row\n"; } merijn@tux:~> cat xx.csv a,b,c,d,e,f 1,2,3,4,5,6 1,3,4,5,3,4 2,5,2,5,2,5 2,4,1,5,8,1 3,5,2,8,1,9 4,5,6,7,8,9 5,1,4,2,5,3 6,2,5,7,1,8 merijn@tux:~> ls -d x* x0 x1 xloadall xx xx.csv xx.pl merijn@tux:~> perl xx.pl 2 5 3 3 5 2 4 8 5 1 5 8 1 5 2 1 merijn@tux:~>

      Enjoy, Have FUN! H.Merijn
Re: How do I sort a CSV file on multiple columns?
by redgreen (Priest) on Dec 09, 2009 at 18:42 UTC

    Text::CSV is only for reading for reading/writing CSV files. If you want to have them sorted, I would suggest using DBD::CSV with the DBI interface.

    Update:Since they were interested in reading the CSV sorted, I forgot that Text::CSV does more then just reading. Thanks kennethk.

Re: How do I sort a CSV file on multiple columns?
by Tux (Canon) on Dec 09, 2009 at 21:02 UTC

    Text::CSV and Text::CSV_XS are modules to parse or generate CSV data and not to process it, if you want to sort it, either do it in perl after having read all data, or use a module that has a higher level interface, like DBD::CSV or Spreadsheet::Read. Withe the first, you'll order using SQL commnands, with the second, you'll order by traversing a hash.


    Enjoy, Have FUN! H.Merijn
Re: How do I sort a CSV file on multiple columns?
by gmargo (Hermit) on Dec 09, 2009 at 18:52 UTC

    Untested but I think this would work.

    # Presume @rows is array of $csv->getline() rows my @rows; # Sort on column 1 then column 44. # Assume numeric data. my @sorted_rows = sort { $a->[0] <=> $b->[0] || $a->[43] <=> $b->[43] } @rows;
Re: How do I sort a CSV file on multiple columns?
by mmittiga17 (Scribe) on Dec 09, 2009 at 19:39 UTC

    here is some code I borrowed that would work. if I can figure out how to get rid of preceeding "00" in a column:

    #!/usr/bin/perl my $sheet; my $count = -1; while( <DATA> ) { chomp; $count++; # skip header next unless $count; my $row; @$row = split( /,/, $_ ); push @$sheet, $row; } foreach my $row ( sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } +@$sheet ) { print join( ',', @$row ), "\n"; } __DATA__ Name,Score,State "001","67","CA" "2","67","CA" "12","63","FL" "1","72","IL" "1","32","AZ"

      The  <=> numeric comparison automatically numerifys to a decimal number before the comparison:

      >perl -wMstrict -le "my @l = qw(10 010 0010 8 08 008 9 09 009 1 01 001); @l = sort { $a <=> $b } @l; print qq{@l}; " 1 01 001 8 08 008 9 09 009 10 010 0010
      The issue with your posted code can be traced back to your not using Text::CSV to parse the data, which I find ironic given your node title (Update: previous title "Text::CSV"). With your splitting above, your data values are $x = '"001"', not the $x = '001' you seem to expect. If you had turned on warnings, you would have gotten Argument ""2"" isn't numeric in numeric comparison (<=>) repeatedly, since "2" is clearly not numeric. Text::CSV will automatically clean up the quotes, and thus your numbers would have been properly be interpreted as numbers. A la:
      #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $sheet; my $count = -1; my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attr +ibute. or die "Cannot use CSV: ".Text::CSV->error_diag (); my @rows = (); while ( my $row = $csv->getline( *DATA ) ) { push @rows, $row; } my $header = shift @rows; foreach my $row ( sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } +@rows ) { print join( ',', @$row ), "\n"; } __DATA__ Name,Score,State "001","67","CA" "2","67","CA" "12","63","FL" "1","72","IL" "1","32","AZ"

      See Use strict warnings and diagnostics or die.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://811997]
Approved by redgreen
Front-paged by redgreen
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (10)
As of 2024-04-18 15:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found