jasonwolf has asked for the wisdom of the Perl Monks concerning the following question:

Help with converting Python script to Perl for CSV sort

I was able to code a rough application in Python for sorting; however, I find coding in Perl more reward able for some strange reason. Due to the latter, I want to create a similar solution in Perl.

However, I am having a hard time understanding a number of examples I have already found.

http://stackoverflow.com/questions/4221424/perl-sort-csv-on-a-certain-column sorting CSV files

If someone has the time, could they whip up a comparable solution, and explain each section? Would it matter if I wanted to sort by number in field 1 or by alpha characters in field 2? Can the same code be used for both?

Example: from link 512942

#!/usr/bin/perl my $sheet; my $count = -1; while( <DATA> ) { chomp; $count++; # skip header next unless $count; my $row; @$row = split( /,/, $_ ); push @$sheet, $row; } foreach my $row ( sort { $a->[1] <=> $b->[1] } @$sheet ) { print join( ',', @$row ), "\n";

I am not understanding the sort in the “foreach” loop… is this regex ? Should I focus on learning REGEX in Perl better? Just need some direction and pointers. To be honest, I feel so lost that I am not even sure how to articulate my questions; however, I want to keep learning Perl even if I find it difficult.

My Python app

import csv field = 1 data = [] with open('input.csv') as csvfile: readCSV = csv.reader(csvfile, delimiter=',') for row in readCSV: data.append(row) def getKey(item): return item[field] sorted = sorted(data, key=getKey) f = open('output.csv','w') for i in sorted: s = ",".join(i) f.write(s + '\n') f.write(s + '\n')

Example of CSV file, which is only used for example

1,Beginning C,Beginning C1 2,Beginning C++,Beginning C++1 3,Python Intro,Python Intro1 4,Acme cook book,Acme cook book1 5,Jumping Jack Flash,Jumping Jack Flash1 6,Zebra,Zebra1 7,Ace hardware,Ace hardware1 8,Baker's dozon,Baker's dozon1 9,Jumbo frames,Jumbo frames1 10,Attack show,Attack show1 11,car 54 where are you,car 54 where are you1 12,navy blue,navy blue1 13,navy gold,navy gold1
Thank you

Replies are listed 'Best First'.
Re: Help with converting Python script to Perl for CSV sort
by hippo (Archbishop) on Jan 31, 2017 at 14:36 UTC

    It looks from the content of your post that you mean CSV rather than CVS. With that in mind you should definitely consider one of the CSV-based modules such as Text::CSV_XS rather than rolling your own.

    I am not understanding the sort in the “foreach” loop… is this regex ?

    No, this is sort. Perl has extensive, searchable, online documentation. Have a read of the linked page and see if that makes it all clear. Also perhaps the FAQ: How do I sort an array by (anything)?.

      Thank you for pointing out my mistype in CSV. I have started reading about the Text::CSV
        Python sucks
        use strict; use warnings; use Data::Dumper; use Spreadsheet::Read; use Text::CSV; my $book = ReadData("test.csv"); my @rows = sort { $a->[1] cmp $b->[1] } Spreadsheet::Read::rows($book- +>[1]); open my $fh, ">pl.csv"; my $csv = Text::CSV->new; $csv->eol("\r\n"); $csv->print($fh, $_) for @rows; close $fh;
Re: Help with converting Python script to Perl for CSV sort
by poj (Abbot) on Jan 31, 2017 at 14:52 UTC
    Can the same code be used for both?

    Not exactly, to sort field 2 alpha

    sort { $a->[1] cmp $b->[1] }

    or field 1 numeric

    sort { $a->[0] <=> $b->[0] }
    poj
      Understand... Thank you. I am currently reviewing the "sort" page to understand the capabilities. Much appreciated.
Re: Help with converting Python script to Perl for CSV sort
by Laurent_R (Canon) on Jan 31, 2017 at 15:14 UTC
    Once your $sheet array ref is populated with your data (which your program is doing more or less correctly), sorting it is fairly easy.

    Sorting numerically on the first field:

    my @sorted = sort {$a->[0] <=> $b->[0]} @$sheet;
    Sorting alphabetically on the second field:
    my @sorted = sort {$a->[1] cmp $b->[1]} @$sheet;
    Note that populating a $sheet array ref (rather than a plain @sheet array) is making things a little bit more complicated than they need to be.

    Update: poj typed faster than I did. ;)

      Let me take a moment and see if I fully understand the code, before I move on. I think I am getting excited and ahead of myself. So
      1. my $sheet; 2. my $count = -1; 3. 4. while( <DATA> ) { 5. chomp; 6. $count++; 7. # skip header 8. next unless $count; 9. my $row; 10. @$row = split( /,/, $_ ); 11. push @$sheet, $row; 12. }
      The above code is using “strict” since I am using “my” in front of “$sheet” and “$count”. “$sheet” and “$count” are scalars/variable and ‘count’ is getting assigned a negative value. However, I believe “$sheet” is getting set to an empty or undefined scalar that can be defined later. Line 4. Is the start of the while loop; however, not sure what to call ‘<DATA>’ Line 5. Is chomping the ‘\n’ from end of line Line 6. Is adding one to $count to skip the column headers/column titles Line 7. Comment Line 8. Not sure but I think this is saying to skip the top row if there is something in count?? Line 9. Read in row from file??? Line 10. Assign row data into array?? And split on comma Line 11. Use the PUSH command to add what is in $row to @$sheet??? Or do I have that backwards?
      foreach my $row ( sort { $a->[1] <=> $b->[1] } @$sheet ) { print join( ',', @$row ), "\n";
      This is where I do my sort as you pointed out in your post, and where I need to format my SORT syntax, which I am reading up on, but you have already assisted with it. Not sure I understand what you mean about “Note that populating a $sheet array ref (rather than a plain @sheet array) is making things a little bit more complicated than they need to be.” What is more complicated? Note – the example I am using is just an example I found on the Internet that I am trying to understand. Thank you JW
        The fact that you're using "my" does not prove that you're using strict, and the use strict; is not there in the code you show. It goes the other way around: if you use strict, then you have to use my (or some other declarator).

        If you use the $sheet variable (with a $ sigil), then you declare a scalar variable. If you had used @sheet, you would have declared an array variable. Independently of whether this variable is defined immediately or left "empty" for the moment and populated later.

        DATA is a special file handle referring to some data put at the end of your script, after a __DATA__ tag (see examples at the bottom of this post).

        If you want to read from a file, then you would have to open the file first and read from the file handle used for opening the file, with something like:

        open my $FH, "<", "file.txt" or die "Cannot open file.txt $!"; while (<$FH>) { # ... }

        On the $count variable: you initialize it to -1, and you inc rement it in your while loop. The first time through the loop, its value becomes 0, and the line with the header is skipped because a 0 value is evaluated to false in Boolean context.

        Line 10 splits the input line stored in $_ and stores the resulting array into the $row array ref. And the next code line stored the row array ref into the $sheet array ref.

        Here is how you could minimally change your code to get the sorted result:

        use strict; use warnings; use Data::Dumper; my $sheet; my $count = -1; while( <DATA> ) { chomp; $count++; # skip header next unless $count; my $row; @$row = split( /,/, $_ ); push @$sheet, $row; } #my @sorted = sort {$a->[0] <=> $b->[0]} @$sheet; my @sorted = sort {$a->[1] cmp $b->[1]} @$sheet; print Dumper \@sorted; __DATA__ HEADER 1,Beginning C,Beginning C1 2,Beginning C++,Beginning C++1 12,navy blue,navy blue1 3,Python Intro,Python Intro1 8,Baker's dozon,Baker's dozon1 9,Jumbo frames,Jumbo frames1 4,Acme cook book,Acme cook book1 5,Jumping Jack Flash,Jumping Jack Flash1 6,Zebra,Zebra1 7,Ace hardware,Ace hardware1 10,Attack show,Attack show1 11,car 54 where are you,car 54 where are you1 13,navy gold,navy gold1

        And this is a slightly improved (simpler) version using arrays instead or array refs. Also using the $. builtin input file line counter, instead of $sount.

        use strict; use warnings; use Data::Dumper; my @sheet; while( <DATA> ) { chomp; # skip header next if $. == 1; my @row = split( /,/, $_ ); push @sheet, [@row]; } my @sorted = sort {$a->[0] <=> $b->[0]} @sheet; #my @sorted = sort {$a->[1] cmp $b->[1]} @sheet; print Dumper \@sorted; __DATA__ HEADER 1,Beginning C,Beginning C1 2,Beginning C++,Beginning C++1 12,navy blue,navy blue1 3,Python Intro,Python Intro1 8,Baker's dozon,Baker's dozon1 9,Jumbo frames,Jumbo frames1 4,Acme cook book,Acme cook book1 5,Jumping Jack Flash,Jumping Jack Flash1 6,Zebra,Zebra1 7,Ace hardware,Ace hardware1 10,Attack show,Attack show1 11,car 54 where are you,car 54 where are you1 13,navy gold,navy gold1

        I hope this helps.

Re: Help with converting Python script to Perl for CSV sort
by Tux (Canon) on Feb 02, 2017 at 12:52 UTC

    Being the author/maintainer of Text::CSV_XS I cannot resist to show its strength. This is a oneliner in perl :)

    $ perl -MText::CSV_XS=csv -wE'csv(in=>[sort{$a->[1]cmp$b->[1]}@{csv(in +=>"test.csv")}])' 7,"Ace hardware","Ace hardware1" 4,"Acme cook book","Acme cook book1" 10,"Attack show","Attack show1" 8,"Baker's dozon","Baker's dozon1" 1,"Beginning C","Beginning C1" 2,"Beginning C++","Beginning C++1" 9,"Jumbo frames","Jumbo frames1" 5,"Jumping Jack Flash","Jumping Jack Flash1" 3,"Python Intro","Python Intro1" 6,Zebra,Zebra1 11,"car 54 where are you","car 54 where are you1" 12,"navy blue","navy blue1" 13,"navy gold","navy gold1" $ cat test.csv 1,Beginning C,Beginning C1 2,Beginning C++,Beginning C++1 3,Python Intro,Python Intro1 4,Acme cook book,Acme cook book1 5,Jumping Jack Flash,Jumping Jack Flash1 6,Zebra,Zebra1 7,Ace hardware,Ace hardware1 8,Baker's dozon,Baker's dozon1 9,Jumbo frames,Jumbo frames1 10,Attack show,Attack show1 11,car 54 where are you,car 54 where are you1 12,navy blue,navy blue1 13,navy gold,navy gold1

    The test with whitespace for readability:

    $ cat test.pl #!/pro/bin/perl use 5.18.2; use warnings; use Text::CSV_XS qw( csv ); # Read the CSV my $aoa = csv (in => *DATA); # Sort on second column my @srt = sort { $a->[1] cmp $b->[1] } @$aoa; # Output the sorted data as CSV csv (in => \@srt); __END__ 1,Beginning C,Beginning C1 2,Beginning C++,Beginning C++1 3,Python Intro,Python Intro1 4,Acme cook book,Acme cook book1 5,Jumping Jack Flash,Jumping Jack Flash1 6,Zebra,Zebra1 7,Ace hardware,Ace hardware1 8,Baker's dozon,Baker's dozon1 9,Jumbo frames,Jumbo frames1 10,Attack show,Attack show1 11,car 54 where are you,car 54 where are you1 12,navy blue,navy blue1 13,navy gold,navy gold1

    FWIW as of this week, Text::CSV also supports this in its pure perl Text::CSV_PP parts. (Yeah!)


    Enjoy, Have FUN! H.Merijn