fishy has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I'm processing public budget data with Rakudo v2023.04 (Raku v6.d) using the module Text::CSV 0.012.

The input file has 150000 lines (records) and I try to load just one column (field) with:

my Str $data = slurp $file_name; my $fh = IO::String.new($data); my $csv = Text::CSV.new(';', '"'); my @column = $csv.getline_all($fh).map( *[$target_col] );

When executing I get:

> raku clean_class.raku PPP_DSP_2002-2021.csv Flattened array has 150000 elements, but argument lists are limited to + 65535 in method print at C:\workbench\budget\lib\IO\String.pm6 (IO::String +) line 40 in method new at C:\workbench\budget\lib\IO\String.pm6 (IO::String) +line 26 in method new at C:\workbench\budget\lib\IO\String.pm6 (IO::String) +line 13 in sub MAIN at clean_class.raku line 77 in block at 'SETTING::'src/core.c/Main.pm6 line 421 in sub RUN-MAIN at 'SETTING::'src/core.c/Main.pm6 line 416 in block <unit> at clean_class.raku line 15

As a workaround, I substituted line 40 of IO::String

@!content.push: |@x;

with

@!content.push: $_ for @x;

and the script runs without complaints.

I'm wondering if some other solution could be feasible?

Thanks and greetings.

Replies are listed 'Best First'.
Re: [Raku] Limited argument list issue in IO::String (Text::CSV)
by kcott (Archbishop) on May 03, 2023 at 14:24 UTC

    G'day fishy,

    I work with biological data that's often of comparable size — I have a 2GB CSV file that I use for volume testing. I would never attempt to slurp an entire file of this size; instead, I would read and process line-by-line. Apart from the huge memory overhead, you're reading the data twice: once to slurp; again to process.

    I don't have Raku available. Here's a quick-and-dirty example in Perl5. I've mostly used the same variable names as your code; hopefully, a conversion to Raku would not be too difficult for you.

    Input CSV:

    $ cat dummy.csv a,b,c d,e,f g,h,i

    Perl5 script column_extract.pl:

    #!/usr/bin/env perl use strict; use warnings; use autodie; use constant TARGET_COL => 1; use Text::CSV; my $file_name = 'dummy.csv'; my @columns; my $csv = Text::CSV::->new(); open my $fh, '<', $file_name; while (my $row = $csv->getline($fh)) { push @columns, $row->[TARGET_COL]; } print "@columns\n";

    Output:

    $ ./column_extract.pl b e h

    — Ken

Re: [Raku] Limited argument list issue in IO::String (Text::CSV)
by Tux (Canon) on May 03, 2023 at 15:03 UTC
    $ cat test.raku #!raku use Slang::Tuxic; use Text::CSV; say "All of it"; my @aoh1 = csv (in => "test.csv", out => Hash); dd @aoh1; say "Fragment col 2"; my @aoh2 = csv (in => "test.csv", out => Hash, fragment => "col=2"); dd @aoh2; $ cat test.csv "No.","Name" "1","hello" "2","bye" $ raku test.raku All of it Array @aoh1 = [{:Name("hello"), "No." => "1"}, {:Name("bye"), "No." => + "2"}] Fragment col 2 Array @aoh2 = [{:Name("hello")}, {:Name("bye")}]

    If you want a ; as separator, just add sep => ";".

    You are allowed to use other ways to write it. Just read the docs.


    Enjoy, Have FUN! H.Merijn
Re: [Raku] Limited argument list issue in IO::String (Text::CSV)
by cavac (Prior) on May 03, 2023 at 13:33 UTC

    I don't use Rakudo. But the number of lines hints to me that (maybe) you should start looking into using a database.

    You didn't really tell us what you are trying to accomplish (the "big picture"), only that you are working with budget data. I'm assuming here that the general rule of "the data amount will exponentially grow over time" holds true for your project as well. A modern relational database like PostgreSQL, MySQL, Oracle or even (oh god oh god, i don't believe i'm saying this) Microsoft SQL Server (please kill me!) could serve you better in the long run than doing your data crunching/reporting/whatever directly on some CSV file.

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Re: [Raku] Limited argument list issue in IO::String (Text::CSV)
by Anonymous Monk on May 03, 2023 at 13:26 UTC
    Why not as docs show
    my @dta = csv(in => "file.csv");
Re: [Raku] Limited argument list issue in IO::String (Text::CSV)
by fishy (Friar) on May 04, 2023 at 15:04 UTC

    Thank you everybody for your comments!

    Here are the whole use case with script and sample input data.

    Greetings.