curtisb has asked for the wisdom of the Perl Monks concerning the following question:

Hi, everyone it's been awhile since my last posting.. And awhile since I wrote perl code. Can someone please point me into the right direction on how to sort text into three columns from a comma delimited file. Thanks Curtisb

Replies are listed 'Best First'.
Re: Comma Delimited File
by broquaint (Abbot) on Nov 07, 2001 at 21:05 UTC
    Having had bad experience with Text::CSV_XS (I'm guessing it was probably me, not the module ;o) I tend to use hand-rolled solutions if it's nice'n'simple (I mean *really* simple), otherwise it's best to rely on the tried-and-tested APIs available. So anyway here's my meagre effort -
    use strict; $_ = <DATA>; my $cnt = tr/,/,/ + 1; my @fields; push @fields, split /,/ while <DATA>; chomp(@fields); my %data; push @{$data{$_ % $cnt}}, $fields[$_] for 0..@fields-1; my @row; for (keys %data) { @row = sort @{$data{$_}}; print "$_ - @row\n"; } __END__ num1,num2,num3 123,345,578 345,349,340 12,348,023 534,90283,230 239,394,283
    What this does is
    • counts the amount of fields (which in turn assumes theres a header line)
    • stores all the data in a flat array
    • get's rid of those danged newlines
    • creates a hash with the columns as the keys
    • then just loops through the data and sorts each column

    Sorry if this is a little direct in approach, but if you're rolling your own it might nudge you in the right direction (and I was looking for a challenge, simple though it is :o)
    HTH

    broquaint

    Update: it's probably best to go with davorg's approach if you actually want to implement this, but I'll leave this here in the spirit of TMTOWTDI.

      Your code has a pretty fundemental flaw where it assumes that all commas in the data are there to separate fields. CSV files allow commas to exist in quoted fields. That's why Text::CSV_XS (or even Text::ParseWords) are far better solutions.

      But, if I knew that this wasn't going to be an issue, I'd do it something like this:

      $_ = <DATA>; chomp; my @hdrs = split /,/; my @recs; while (<DATA>) { chomp; my %rec; @rec{@hdrs} = split /,/; push @recs, \%rec; }

      That seems, to me, to create a far more useful data structure - an array of records, with each record being a reference to a hash where the keys are the header fields and the values are the associated data values.

      Here's an example that then sorts the data on the second value in each record:

      print "num1\tnum2\tnum3\n"; foreach (sort { $a->{num2} <=> $b->{num2} } @recs) { print "$_->{num1}\t$_->{num2}\t$_->{num3}\n"; }
      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you don't talk about Perl club."

      Could you fill us in on the problems you've had with Text::CSV_XS? I've had no trouble so far using it with files generated by Excel.
        As I said, the problem was probably me and not Text::CSV_XS. But so as not to be elusive I'll fill you in as to why I gave up on it.

        Firstly I was working on something that had to done as fast as humanly (or inhumanly as the case sometimes is) possible, so when I encountered the problems with Text::CSV_XS I didn't spend too much time on figuring out what wrong.
        I wasn't doing anything particularly fancy or neat, just formatting some data from a TDF. But for some reason it would only find 4 of 9 records, and I can't see why. There wasn't anything too gruesome in the data (as it was created to be as friendly as possible) and splitting it on "\t" seemed to work a treat so I stuck with that.

        If you're interested in the code it was something like this -

        use strict; use Text::CSV_XS; my $tdf = Text::CSV_XS->new({'sep_char'=>"\t"}); open(TDF, "somedata.tdf") or die("b0rk - $!"); my @fields = (); while($tdf->parse(scalar <TDF>)) { @fields = $tdf->fields(); print "@fields\n"; }
        That seems to work fine on some simple data, but it broke on the data I was feeding it, so I'm guessing that the data was the root of the problem (and I would've dug further but I was under a *wee* bit of pressure ;o). I should imagine I'll use it if the need ever arises again, and hopefully the data behaves itself this time (assuming it was the data ...)
        HTH

        broquaint

Re: Comma Delimited File
by suaveant (Parson) on Nov 07, 2001 at 20:01 UTC
    Text::CSV_XS would probably help :)

                    - Ant
                    - Some of my best work - (1 2 3)

Re: Comma Delimited File
by busunsl (Vicar) on Nov 07, 2001 at 19:59 UTC

      This problem breaks neatly into two smaller ones. First, there is the problem of reading the CSV file. That problem may be (temporarily) simplified by assumimg the data has no embedded commas (or perhaps the file could be made to use tab or '~' or some other separator not part of the data).

      Then the problem simplifies to a sorting one (as I read the problem statement). I.e., sort a two-dimensional array of data by the first three columns (however many other columns there are).

      The simplest way to do this might then be this:

      # comparison routine for sort() assumes (for fun) that col1 and col3 a +re # alpha, and col2 is numeric. Obviously this would have to reflect th +e data sub by_first_3_cols { $a->[0] cmp $b->[0] || # first, compare col1 $a->[1] <=> $b->[1] || # if equal, compare col2 $a->[2] cmp $b->[2]; # if equal, compare col3 } # sort list of arrayrefs, each containing data from one line of input my @sorted = sort by_first_3_cols map { chomp; my @columns = split /,/; [ @columns ] } <DATA>; # now print sorted list in CSV format (or whatever) print join( ',', @$_), "\n" for @sorted;

      Let me acknowledge that the problem of extracting CSV data is more complex than simply splitting on ','. However, it may be made a simpler problem by specifying a different separator character--tab is my favorite, '|' (pipe) or '~' (tilde) work pertty well, too. Excel, for one thing, can write a tab-delimited file (having read the data from a comma-separated one).

      This neatly end-runs the whole Text::CSV_XS issue, if possible, which IMHO is a good thing.

      dmm