Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Complex file manipulation challenge

by Tux (Canon)
on Aug 13, 2019 at 17:51 UTC ( [id://11104401]=note: print w/replies, xml ) Need Help??


in reply to Complex file manipulation challenge

Untested (assuming 2500 lines as cut-of)

  1. my @csv_files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, -s ] } glob "*.csv"
  2. my @csv = map { @{csv (in => $_, headers => "skip")} } @csv_files;
  3. use Text::CSV_XS qw( csv );
    use PerlIO::via::gzip;
    my $n = "0000";
    while (@csv) {
  4.     open my $fh, ">:via(gzip)", "new".$n++.".csv.gz";
        csv (in => [ splice @csv, 0, (@csv > 2500 ? 2500 : $#csv) ], out => $fh);}
        }

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: Complex file manipulation challenge
by haukex (Archbishop) on Aug 13, 2019 at 18:01 UTC

    Nice! Note that the original problem statement includes this:

    Note: Your program should work when files are much bigger than memory in your JVM and must close all open resources correctly

      I did not read the original problem statement :)

      csv (in => $fh, out => undef, on_in => sub { ... }); supports streaming and does not store in memory (other than the current record. Rewriting my version to do that can be an exercise to the reader.

      In preparation I found that PerlIO::via::gzip *only* supports open my $fh, ":via(gzip)", "file.gz"; and *not* open my $fh, ">", "file.gz"; binmode $fh, ":via(gzip)"; :( :(


      Enjoy, Have FUN! H.Merijn
        csv (in => $fh, out => undef, on_in => sub { ... }); supports streaming and does not store in memory

        Nice, good to know!

        I found that PerlIO::via::gzip *only* supports open my $fh, ":via(gzip)", "file.gz";

        Too bad, but IO::Compress::Gzip (core module) that I showed should support that AFAIK.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11104401]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2024-04-18 16:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found