in reply to Need to sort comma delimited, double quoted file

The are multiple examples freely available to do exactly that. One of the tools that enables you to do this from the command-line is xlscat as available in the Spreadsheet::Read distribution. That will (also) use Text::CSV_XS to parse your CSV data

$ xlscat --sort=31n,1n,2n file.csv

As your CSV however is not valid CSV (see ,"Max Smart Super, Speciality Hospital "O"", where the double quotes inside the field are not escaped), you might need to roll your own script.

A really quick braindump that works on the snippet you showed is:

use 5.18.2; use warnings; use Data::Peek; use Text::CSV_XS qw( csv ); my $aoa = csv (in => "test.csv", escape => undef, allow_loose_quotes = +> 1); csv (in => [ sort { $a->[30] <=> $b->[30] || $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @$aoa ]);

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: Need to sort comma delimited, double quoted file
by CSharma (Sexton) on Jul 26, 2017 at 06:55 UTC
    I tried the same logic to get my job done as found somewhere but used Text::CSV. Thanks for help Merijn!!
    my @sorted = sort {$a->[30] <=> $b->[30] || $a->[0] <=> $b->[0] || $a- +>[1] <=> $b->[1]} @$ref;

    Moreover, I'm stuck with other things now: i.) The script runs on my local machinewindows, strawberry perl, but fails on linux servers; looks like the files I'm getting are created on windows. Error captured is:: Line could not be parsed: Inappropriate ioctl for device2032EIF - CR char inside unquoted, not part of EOL12645948 which is because of Carriage return, can't see that with 'vi'. How can I handle this, any help?

    ii.) And yes file has inner double quotes in fields as given above; so get below error while executing this; can we handle this?: Line could not be parsed: Inappropriate ioctl for device2032EIF - CR char inside unquoted, not part of EOL12645948 at ./max_sort.pl line 69, <$data> line 60. # CSV_XS ERROR: 2023 - EIQ - QUO character not allowed

    my ($file, $sfile) = @_; my $csv = Text::CSV->new({binary => 1, decode_utf8 => 1, auto_ +diag => 1, allow_loose_quotes => 1}); open(my $data, '<:encoding(utf8)', $file) or die "Could not op +en '$file' $!\n"; my $i = 0; my $ref; my $lineCt = 1; my @header; while(my $line = <$data>){ chomp $line; if($lineCt == 1){ @header = $csv->fields(); $lineCt++; next; } if($csv->parse($line)){ my @fields = $csv->fields(); $ref->[$i] = \@fields; $i = $i+1; } else{ warn "Line could not be parsed: $!",$csv->erro +r_diag; } }

      That code would fail if *any* of the lines contains embedded newlines.

      I'd simplify the code to this, assuming the header has no embedded newlines:

      my $csv = Text::CSV_XS->new ({ binary => 1, decode_utf8 => 1, auto_diag => 1, allow_loose_quotes => 1, }); open my $data, "<:encoding(utf-8)", $file or die "$file: $!\n"; my $ref; my @header = $csv->header ($data); while (my $row = $csv->getline ($data)) { push @$ref, $row; }

      In CSV don't let perl deal with the EOL. In Text::CSV and Text::CSV_XS the differences between Windows EOL and Linux EOL aur fully automatically dealt with within the definition of legal CSV.

      I cannot tell anything about inappropriate ioctl calls regarding to CSV parsing. Neither on Linux nor on Windows. If the error is reproducable, I'd need the CSV file in full (preferably in a ZIP to ensure no binary conversions take place).


      Enjoy, Have FUN! H.Merijn
        Thanks everyone for help!! I got things worked with below options in my case.
        my $csv = Text::CSV->new({binary => 1, decode_utf8 => 1, auto_diag => +1, allow_loose_quotes => 1, allow_loose_escapes => 1});

        Can anybody please help, how to handle embedded quotes in this case (with Text::CSV module? See the HTML field below:

        7252798,5830,"Glycosylated Haemoglobin (HbA1C) EDTA",1656,"template",, +"<HTML><HEAD><META NAME=\"GENERATOR\" Content=\"Microsoft DHTML Editi +ng Control\"><TITLE></TITLE></HEAD><BODY><P>&nbsp;</P></BODY></HTML>" +,2017-07-23 15:15:27,0,"",N,"",,2017-07-23 15:15:27,MM0165818,6,"All +Tests Done and Verified",,MSIN743908,2017-07-22 16:34:07,10,"Max Supe +r Speciality Hospital-Saket",27,"Poonam S Das",0,"S GUPTA",False,"S", +"GUPTA",SKMS,355419,21278,"Puneet Agarwal",1945-01-23 00:00:00,0,NUL +L,"9810006763","INFO@MAXHEALTHCARE.COM",OP,NO,Verified,2017-07-22 16: +34:07,2017-07-22 16:34:07,Lab,697693,0,"",M
        Error:

        # CSV_XS ERROR: 2023 - EIQ - QUO character not allowed @ rec 1981 pos 110 field 7 Line could not be parsed: Inappropriate ioctl for device2023EIQ - QUO character not allowed11019827