1nickt has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I am wondering about the most elegant, performant, and robust way to select data from a database and output a delimited file (e.g. CSV). We use MySQL so manually running SELECT ... INTO OUTFILE ... is always possible, and I have colleagues who think that's fine since "we know there will *never* be a tab in the data" ... Personally I hate shelling out to begin with and would always rather use Perl, and I *never* trust that there will *never* be an occurrence of something that can *never* happen. We use DBIx::Class so in application code we are always querying on a DBIC ResultSet.

Naturally I can select data into an AoA with selectall_arrayref() and just use csv() from Text::CSV_XS, but I am wondering if there isn't an interface. I see that perlancar has released DBIx::CSV which provides methods like $dbh->selectall_csv( $sql ), and I have tried it, and it worked. But I have two reservations about it: (1) It extends DBI rather than DBIx::Class, so would require me to "get my hands dirty" selecting the data with DBI, and (2) while the author has created a panoply of interdependent modules, I am always leery of one-person ecosystems, and in this case the backend is another of his/her modules, Text::Table::CSV. I am just solidly in the Text::CSV_XS camp at this point and don't plan to introduce another CSV-handling library.

(Is DBIx::Class extensible, as in, add a keyword something like $rs->search({...})->csv() ?

Thanks for your wisdom.

Edit: clairfy proposed DBI solution


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re: Select DB data into CSV file
by Tux (Canon) on Dec 20, 2018 at 13:02 UTC

    Using an AoA from Text::CSV_XS is very inefficient. Please read the examples that show you how to do this streaming.

    I should update that section to make use of the relative recent undef_str attribute.


    Enjoy, Have FUN! H.Merijn

      Sorry, sloppy writing. I meant this:

      # using the csv function, all in memory csv (out => "foo.csv", in => $dbh->selectall_arrayref ($sql));


      The way forward always starts with a minimal test.

        This is fine until your table outgrows your memory: selectall_arrayref will read all rows into memory and then pass it to csv (), whereas using a fetch handle has virtually no memory overhead.

        my $sth = $dbh->prepare ("select * from foo"); $sth->execute; csv (out => "foo.csv", in => sub { $sth->fetch }, undef_str => "\\N");

        Enjoy, Have FUN! H.Merijn
Re: Select DB data into CSV file
by perlancar (Hermit) on Dec 20, 2018 at 16:59 UTC

    Hi 1nickt,

    Thanks for the mention. I linked DBIx::CSV to Text::Table::CSV instead of Text::CSV_XS directly because DBIx::CSV is just a special case of DBIx::TextTableAny to output to various forms of text tables, which CSV is one of.

    I wrote a short blog post on DBIx::CSV here and noted that you can also do something like:

    use Text::CSV; my $csv = Text::CSV->new; ... $csv->print(\*STDOUT, $sth->{NAME}); while (my $row = $sth->fetchrow_arrayref) { $csv->say(\*STDOUT, $row); }

    which doesn't require any additional module and not much extra code anyway. But since you do not use DBI directly, that's a bit of a problem as well.

    I haven't looked into DBIx::Class to know if it's easily extensible to do what you want. But how about a patch-based solution like Patch::DBI::WriteCSV (use this if the link doesn't exist yet). As long as something like DBIx::Class calls one of DBI's fetchrow_{array,arrayref,hashref} or fetchall_{arrayref,hashref}, it will work. I've tested it with the simple sample database from the DBIx::Class documentation. This code:

    Patch::DBI::WriteCSV->import; my @all_artists = $schema->resultset('Artist')->all; Patch::DBI::WriteCSV->unimport;
    will produce something like:
    artistid,name
    1,Enya
    2,"Mariah Carey"

    If you don't want to depend on my module, you can just use the underlying code and integrate that to your own code base. As for the "one-person ecosystem" remark: there's a line between NIH syndrome and "not finding what I want/need on CPAN" and I can say that I am still on the second camp. But I can't help if what I want/need is not on CPAN and thus write a module for it :-) I do use other people's modules. Of my 1512 distributions on CPAN, I depend on 610 modules written by others.

    % lcpan dists --author PERLANCAR | xargs lcpan deps | grep -v PERLANCAR | wc -l

      LOL perlancar, you are amazing. I didn't expect to get a new distro out of my question! Thanks for your reply which I will study this evening, very kind.

      ( P.S. The blog posted you linked to was how I found DBIx::CSV in the first place ;-) )


      The way forward always starts with a minimal test.
Re: Select DB data into CSV file
by Your Mother (Archbishop) on Dec 20, 2018 at 21:27 UTC

    That patch thing from perlancar is *wild* and I'm glad to learn of it. It might be best for your problem.

    DBIx::Class is a little bit of a mismatch because CSV is closer to a View, as I see it, than a data format and to get DBIC to operate at that level, per row CSV writing to strings instead of a stream, it will have a large performance penalty. You could do something like–

    my $rs = $schema->resultset("UrTable")->search({ ur => "search" }); my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 }) or die "Cannot use CSV: ", Text::CSV_XS->error_diag; my @headers = $rs->result_source->columns; $csv->say( \*STDOUT, [ @headers ] ); my $cursor = $rs->cursor; while ( my @vals = $cursor->next ) { $csv->say( \*STDOUT, [ @vals ] ); }

    It's easy enough to drop in some processing in a as_cvs method in your App::Schema::Result::UrTable package but it would be an expensive way to get it. If I were going to break down and do it, I'd do a custom component class to inherit from; I've done this for JSON. You could do something more meta in App::Schema::ResultSet::UrTable… but I'm not sure what that would look like. I would look at the source of DBIx::Class::ResultClass::HashRefInflator as the basis for ideas.

      If performance is an issue, do not take a copy of the list every time, use the reference:

      my $rs = $schema->resultset ("UrTable")->search ({ ur => "search" }); my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r +\n" }) or die "Cannot use CSV: ", Text::CSV_XS->error_diag; my @headers = $rs->result_source->columns; $csv->print (*STDOUT, \@headers); # ^ my $cursor = $rs->cursor; while (my @vals = $cursor->next) { $csv->print (*STDOUT, \@vals); # ^ }

      I should also note that say has runtime overhead. For top-speed, use the eol attribute and print.

      *STDOUT does not require the backslach in Text::CSV_XS (I just verified with 5.8.1 and Text::CSV_XS-1.12).

      Performance-wise it is a shame that DBIx::Class::Cursor's next (got to DBIx::Class::ResultSet's next for the actual documentation) uses DBI's fetchrow_array instead of the faster and more efficient fetchrow_arrayref (or it's alias fetch), which could then directly be passed to $csv->print.


      Enjoy, Have FUN! H.Merijn
Re: Select DB data into CSV file
by kschwab (Vicar) on Dec 20, 2018 at 15:56 UTC
    Mysql's "INTO OUTFILE" supports "ESCAPED BY" "ENCLOSED BY" and other parameters that you could probably tweak to make it safe.
      Mysql's "INTO OUTFILE"

      SELECT ... INTO OUTFILE writes to a file on the database server's filesystem (if permitted), whereas a classic SELECT + Text::CSV as proposed in Re^3: Select DB data into CSV file writes to the client's filesystem. And SELECT ... INTO OUTFILE is not portable across databases.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Sure, but the original post mentions the team is considering it, just with the unsafe assumption that TAB characters won't be in the data.