pigal has asked for the wisdom of the Perl Monks concerning the following question:

Greetings to all Glamour and Honourable MONKS!!!
I've got two two demensional arrays (array of arrays) each of them has n*10^3 rows. I need them separatly and glued up (adequate rows in both arrays describes the same object). My problem is how to glue them up effectively?
Example( :-) ):
@arr1=([1,2,3],[4,5,6],[7,8,9]); @arr2=([11,22,33],[44,55,66],[77,88,99]); @garr=magic(arr1,arr2) #([1,2,3,11,22,33],[4,5,6,44,55,66],[7,8,9,77,88,99])

I could do like this:
for (my $i=0;$i<scalar(@arr1);$i++) { $garr[$i]=(@$arr1[$i],@$arr2[$i]); }
but 50000 script loop iterations kills my weak server. (rows has ~n*10 cells)
Best Regards
Peter

Replies are listed 'Best First'.
Re: Expanding Two demensional arrays
by dragonchild (Archbishop) on Jan 29, 2007 at 14:11 UTC
    Use DBM::Deep. It will be slower than RAM, but it won't cause your script to go into swap.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      I concern for this sentence from synopsis "A unique flat-file database module, written in pure perl." Is this efficient?
      THX
        Yes, it is efficient. It's not fast. It does several things that no other DBM (that I know of) does:
        • It handles nested Perl hashes and arrays to any level
        • The file is ftp'able across OSes and endian-ness
        • It is Pure-Perl, which means it's easily deployed on Windows
        • If you use the tie interface, you only make a change in one place in your program and the rest of your program thinks it's using a regular variable.

        Hence, given what I know about your problem, this sounds like a really good first-pass to see if memory thrashing is your root issue.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Expanding Two dimensional arrays
by davorg (Chancellor) on Jan 29, 2007 at 14:14 UTC
      Nice but time to wait is to long for end users
      THX
Re: Expanding Two demensional arrays
by Tanktalus (Canon) on Jan 29, 2007 at 16:09 UTC

    What do you mean 'kills'? For example, does it peg the CPU? Start swapping? What?

    Depending on that answer, there may be options to help that. Or maybe not. For example, if you're swapping, DBM::Deep may help - but, in many ways, that's just swapping one part of the hard drive for another (no pun intended). How about not gluing them up until you're about to use them? e.g.,

    for (my $i = 0; $i < scalar @arr1; ++$i) { my @garr = (@$arr1[$i], @$arr2[$i]); # do main processing on @garr here. }
    That way, you only have the combination of one row at a time in memory, discarding when done. Or, perhaps you can destroy the original arrays as you go?
    for (my $i = 0; $i < scalar @arr1; ++$i) { @{$garr[$i]} = ( shift(@$arr1[$i]), shift(@$arr2[$i]) ); }
    At the end of this, @arr1 and @arr2 should be empty.

    If you're pegging the CPU, your actual handling of @garr will peg it, too, so there's really not much I can think of for doing.

      - CPU: 100% and timeout for tea :-)
      - I have no idea how to check swapping on Linux RH9, but HDD works hard so - maybe there are also problems.
      - Your first advice is great but I've no power to force it (but I've tried and who knows maybe it will be accepted).
      - I can not destroy source arrays

      THX
Re: Expanding Two demensional arrays
by Sagacity (Monk) on Jan 29, 2007 at 18:02 UTC

    If you're running out of dynamic resources such as RAM, etc. You may want to replace @garr by using a file on the hard drive as your output holder. It will act exactly the same but will relieve your system memory of the ever growing array @garr. You will ofcourse, have to then open it and get the contents for your final output, but again this is ONLY a thought with respect to conserving dynamic resources while the main concantenation/(sic) is proceeding. It is a redirect and nothing more.

    The 50000 iterations would then only consume memory based on the first 2 arrays. Once all of the information was packed into the output file, you could then set there values to ''. The point seems to be to conserve the memory usage in RAM whilst running.

    It's just a suggestion

Re: Expanding Two demensional arrays
by BrowserUk (Patriarch) on Jan 30, 2007 at 12:42 UTC

    You might consider using PDL. Piddles use far less space than normal Perl arrays. Also (I think; you'll need to explore it or consult an PDL expert), it would be able to combine the two piddles into a third using it's Dataflow feature, in a way that would not cause them to be copied wholesale and therefore, would use even less space.

    In theory, assuming your numbers are 32-bit ints, a piddle would require ~1/3 or 1/4 the space of an equivalent Perl array. And, if the Dataflow aliasing will work to allow you create the 3rd array as an mapping over the other two, it possible the memory requirement for holding all 3 piddles could be close to 1/6 th of the space required by the Perl solutions. PDL is also renowned for doing what it does very quickly.

    Sorry for the scant information, but I really don't know much about PDL, so take this suggestion with a huge handful of salt.

    With luck lin0 or Zaxo or one of the other monks with PDL expertise will read this and expand further or shoot me down in flames.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Expanding Two demensional arrays
by pigal (Novice) on Jan 30, 2007 at 12:30 UTC
    I've forced Tanktalus advice #1, so problem is solved for me.
    THANK EVERY ONE FOR HELP