Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Parallel::ForkManager, DBI using memory

by biosysadmin (Deacon)
on Nov 07, 2004 at 22:27 UTC ( [id://405950]=note: print w/replies, xml ) Need Help??


in reply to Parallel::ForkManager, DBI using memory

Based on your conversation with The Mad Hatter, it looks like you'll need a separate $dbh for each child process. Also, I can't imagine that each of your child processes actually needs its own database handle.

I'm not sure of the exact mechanism for memory management with forking processes (it may even vary across operating system), but I'm guessing that every time you use Parallel::ForkManager to fork off another process you're needing to make another copy of the program's namespace. If this is the case, then your program would be using this much memory regardless of the DBI's implementation for generating DBH handles. For a simple test of this, comment out all of the lines dealing with database handles in your code and see if your memory grows in the same way.

If this were my problem, I'd try to balance the speedup of using multiple Parallel::ForkManager processes along with the need to keep the number of database connections low. Why not divide your @links array into N parts, each of which is processed by a separate Parallel::ForkManager process? I've tried that approach with other programs using Parallel::ForkManager, and it's worked very well.

If this is a serious application, then you might even do some benchmarking to determine the limiting factor in your processing. If your process is CPU-limited and your box has multiple CPUs, then set N equal to the number of CPUs on your box. The name @links suggests to me that it might be network-limited, in which case having N being medium-sized (10-50 in my mind) might be a good idea. Only benchmarking will tell the whole story. Best of luck with your problem. :)

Replies are listed 'Best First'.
Re^2: Parallel::ForkManager, DBI using memory
by 2ge (Scribe) on Nov 08, 2004 at 13:36 UTC
    Thanks for answer BioSysadmin,

    I read some documentation about this, and yes - every child process needs own db handler. Next - ofcourse, I tried commenting out DBI stuff, it works good, so it is bug in DBI, some memory leak, or what ? I don't believe that.

    And thats not point, if I run 5 processes at once, or 50, or just 2. Always, when my thread ends it takes some memory, so at the finish of script perl process will consume equal memory. I hope someone will give me answer to this interesting question.

    Brano
      Interesting. Another thing you might try is manually undef'ing your database handles at the end of your loop, like this:
      foreach my $x (0..$#links) { $pm->start and next; my $dbh = DBI->connect("DBI:mysql:database=customers;host=loca +lhost;port=3306", "root", "") or die "Can't connect: ", $DBI::errstr; $dbh->{RaiseError} = 1; print "$links[$x]\n"; # do something (read, update, insert from db) $dbh->disconnect; undef($dbh); $pm->finish; }
      Best of luck. :)
        Hi biosys!

        thanks for next suggestion, I tryied that, unfortunately it doesn't help, also you have one little error in your posted script, undef($dbh) should be before $pm->finish. Any more ideas ? :)
        I really don't know how to solve this, I have around 15.000+ iterations, so I will always run out of memory by using this :(

        --
        Brano

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://405950]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-16 12:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found