Ppeoc has asked for the wisdom of the Perl Monks concerning the following question:

I have a 17 MB CSV file that needs to be parsed. So I parsed the CSV file and saved it as multidimensional array
while (my $line = < $in_ph > ) { chomp $line; my@ fields = split(/,/, $line);@ key = split(/_/, $fields[0]); $data[$count][0] = $fields[1]; foreach(@key) { $data[$count][$k + 1] = $key[$k]; $k++; } $k = 0; $count++; }
I know need to access only certain elements of each row and column. I am running out of memory during this action. Please help me out with this
my $n = 2033; my $color = $data[$n][0]; my $order = $data[$n + 1][0]; my $shape = $data[$n + 3][0]; my $name = $data[$n + 4][0]; foreach my $aref(@data[2033..$#data]) { if (($aref - $n) % 6267 == 0) { $color = $data[$aref][0]; $order = $data[$aref + 1][0]; $shape = $data[$aref + 3][0]; $name = $data[$aref + 4][0]; } print $out_ph1 $predata, ",", $color, ",", $order, ",", $shape, ", +", $name, ","; foreach(@{ $aref }[4..$#$aref]) { print $out_ph1 $_, "\_"; } print $out_ph1 "\n"; }
I need parse data from row 2033 onwards. I have initialised my color, order, shape and name to certain values from my @data. I know have a for loop that traverses from row 3022 to the end of @data. After every 6267 rows traversed, I want to change the value of color, order, shape and name. I am using the if loop for the same. At the same time, I want color, order, shape and name joined with data from column 4 onwards for each row. I keep getting the warning use of uninitialized value in join for the below line
print $out_ph1 $predata, ",", $color, ",", $order, ",", $shape, ",", $ +name, ",";
I am also running out of memory before the loop completes. Please help me! I am not sure if the way I have parsed the CSV file is causing this or if the way I am displaying this is.

Replies are listed 'Best First'.
Re: Running out of memory while running this for loop
by BillKSmith (Monsignor) on Nov 01, 2015 at 18:18 UTC

    I suspect that you are confusing perl's reference with C's pointer. Perl's reference-to-an-array is exactly what its name implies. It really is of no concern to you what it actually contains or how it works. It does not make any sense to use it in an arithmetic expression.

    Perl does not have a data type "multi-dimensional-array". Perl simulates it very well with the structure known as an "array-of-arrays". (This is actually an array of references each of which refers to an array.) Perl provides a syntax with multiple indicies to conveniently access individual elements of this structure. You do not know or care where the individual arrays are stored. You have a reference to each one.

    I doubt that your data is stored correctly. I cannot be sure without a reasonable sample of your data.

    Bill
Re: Running out of memory while running this for loop
by shmem (Chancellor) on Nov 01, 2015 at 17:06 UTC
    I am also running out of memory before the loop completes.

    What is your idea of $aref in that loop that bombs off? Is it an integer, or an array reference? What it the result of adding 1 to an array reference? or 2 ? What happens to an array (or array reference) if you index it with the result of that addition?

    my $aref = []; print $aref,"\n"; print $aref + 1,"\n"; print $aref + 2,"\n"; print "mem: ",qx{ps -o vsz= $$}; $aref[$aref +2] = "foo"; print "mem: ",qx{ps -o vsz= $$}; __END__ ARRAY(0xf5dba0) 16112545 16112546 mem: 28964 mem: 154844

    By converting the hexadecimal address of $aref to integer - this is what happens if you add an integer to it - and using it then as an index, you extend @data and pre-allocate slots, each containing a NULL value. If the address of $aref is high enough, your program hits the memory limit of your machine.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
      Yes! You are right. Thank you for catching that error. I was adding index value to array reference. How can I change my for loop to make it work?

        The "Perlish" way to iterate element-by-element over an array is:

        c:\@Work\Perl\monks>perl -wMstrict -le "my @array = qw(zero one two three); ;; for my $element (@array) { print qq{element '$element'}; } " element 'zero' element 'one' element 'two' element 'three'
        If each "element" of  @array is actually an array reference, you might then need to iterate in turn over the referenced array in similar fashion:
        c:\@Work\Perl\monks>perl -wMstrict -le "my @array = ( [ 1, 2, 3, ], [ 'v' .. 'z' ], [ qw(one two) ], ); ;; for my $arrayref (@array) { for my $element (@$arrayref) { printf qq{'$element' }; } print ''; } " '1' '2' '3' 'v' 'w' 'x' 'y' 'z' 'one' 'two'
        Please see perldsc for more info and examples.


        Give a man a fish:  <%-{-{-{-<

Re: Running out of memory while running this for loop
by GrandFather (Saint) on Nov 02, 2015 at 02:07 UTC

    You really should be using a module to process CSV. Using Text::CSV for example turns your "parsing" code into:

    my $csv = Text::CSV->new(); my @data = @{$csv->getline_all($in_ph)};

    and as a side benefit handles commas in the data and strings with line breaks in them. Maybe your data doesn't have such wayward content, but it's sure nice to know the whole script won't fail if it does.

    If your data file was huge you might want to handle it record by record instead of loading the whole lot into memory up front. Your code then becomes something like:

    my $csv = Text::CSV->new(); while (my $aref = $csv->getline($in_ph)) { ... }
    Premature optimization is the root of all job security