in reply to perl group by and sort from a csv input file

As you can see, you have an embarrassment of riches in the replies. Some of them are purposefully terse and idiomatic because it's something of a sport here when a SoPW doesn't give code attempts. :P If you work with one of them and have follow-up questions, don't hesitate to ask, but post whatever code you tried to use.

You got an interesting but terminally slothful, half-right dose of self-congratulation regarding the use of a database. This is not necessary of course as several of the replies neatly addressed your actual question, and need, and are trivial to adapt to many other requirements if you can follow the code.

That said, some persons find SQL a more natural way of working with data so it is an interesting and potentially useful thing to do; there is no try®. Like so many things, it is semi-trivial in Perl if you know how. Building on previous answers, here's how–

#!/usr/bin/env perl use strict; use warnings; use Text::CSV_XS "csv"; use DBI; my $csv_file = shift || die "Give a CSV file with sample data\n"; my $dbh = DBI->connect("dbi:SQLite::memory:"); # DB is ":memory:" $dbh->do(<<""); CREATE TABLE sampleData( sample, input, amount ) my $insert_h = $dbh->prepare(<<""); INSERT INTO sampleData VALUES( ?, ?, ? ) csv( in => "test.csv", on_in => sub { my @values = @{ $_[1] }; $insert_h->execute(@values) if @values == 3; }); my $tallies = $dbh->selectall_arrayref(<<""); SELECT sample ,input ,SUM(amount) FROM sampleData GROUP BY sample, input ORDER BY sample, input csv( in => $tallies, out => "outputfile.csv" );

You will need fairly recent versions of a couple of these for this to run, Text::CSV_XS, DBD::SQLite.

An excellent overview of DBI recipes: DBI recipes. And a footnote for working with the data outside of Perl–

$dbh->sqlite_backup_to_file("newDBname.sqlite"); # ^^^ To go from ":memory:" to a file. Then you also have access to # the DB via the command line with the sqlite executable. # moo@cow[2574]~>sqlite3 "newDBname.sqlite" # sqlite> select * from sampleData; # 3211111|100|3.2 # 3211112|101|3.2 # ...et cetera...

Update: s/CVS/CSV/g for @all_the_nodes;#!!!

Replies are listed 'Best First'.
Re^2: perl group by and sort from a csv input file
by Tux (Canon) on Jul 28, 2017 at 20:15 UTC

    Then why not use DBD::CSV directly?

    $ cat test.csv 3211111,100,3.2 3211112,101,3.2 3211111,100,1.2 3211112,100,2.2 3211113,100,5.2 3211112,100,0.3 $ cat test.pl use 5.18.2; use warnings; use DBI; use Text::CSV_XS qw(csv); my $dbh = DBI->connect ("dbi:CSV:"); $dbh->{csv_tables}{sampleData} = { file => "test.csv", col_names => [qw( sample input amount )], }; csv (in => $dbh->selectall_arrayref (" SELECT sample, input, SUM (amount) FROM sampleData GROUP BY sample, input ORDER BY sample, input")); $ perl test.pl 3211111,100,4.4 3211112,100,2.5 3211112,101,3.2 3211113,100,5.2

    Enjoy, Have FUN! H.Merijn

      Oh, I don't think a DB was necessary at all. But if you're going to use that idiom, it's better, I argue, to put it into a DB so you can use the DB tools. As I think I've said to you before, all ++s to you for your CVS CSV (update: DERPy fingers) related work. :P

Re^2: perl group by and sort from a csv input file
by erix (Prior) on Jul 28, 2017 at 20:40 UTC

    It would be absurd to install postgres for this functionality but if it's available it offers at least two attractive alternatives. I'll just put my test file here:

    #!/bin/sh echo "OPTION 1: DATA FULLY IMPORTED in temp table" time ( < INPUTFILE.csv psql -c " drop table if exists t; create temporary table t(i1 int, i2 int, n3 numeric); copy t from stdin with (format csv, delimiter ','); copy ( select i1,i2,sum(n3) from t group by i1,i2 order by i1,i2 ) to stdout with(format csv, delimiter ','); " ) echo "-- OPTION 2: DATA read via Foreign Data Wrapper (file_fdw)" echo " drop foreign table if exists inputfile cascade; drop server pgcsv cascade; " | psql -qX time ( echo " create server pgcsv foreign data wrapper file_fdw; create foreign table inputfile ( i1 int, i2 int, n3 numeric ) server pgcsv options ( filename '/tmp/INPUTFILE.csv', format 'csv' ); copy ( select i1,i2,sum(n3) from inputfile group by i1,i2 order by i1, +i2 ) to stdout with(format csv, delimiter ',') " | psql )

    I thought it was interesting to see the different timings:

    -- OPTION 1: DATA FULLY IMPORTED in temp table real 0m0.048s user 0m0.002s sys 0m0.006s -- OPTION 2: DATA read via file_fdw (foreign data wrapper) real 0m0.038s user 0m0.002s sys 0m0.006s -- Your program real 0m0.051s user 0m0.048s sys 0m0.003s

    Foreign Tables, via Foreign Data Wrapper file_fdw, are handy in the gray area between file and database. There is the next step towards full database via a materialized view (of a foreign table) which offers more real db-features (indexing, for one).