in reply to Re^4: Attempt to free temp prematurely and unreferenced scalar
in thread Attempt to free temp prematurely and unreferenced scalar
Honestly, without seeing the code, anything would be (another) guess. What are you doing with 60k of input data to create even 1 20MB datastructure?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Attempt to free temp prematurely and unreferenced scalar
by neversaint (Deacon) on Feb 22, 2006 at 11:32 UTC | |
Basically what my code is doing is to take sets of DNA input sequences, and find a conserved substrings within them. The size of the variable expand, because for each length "W" strings from the input sequence I collect the again the substrings of it. So here I ran the "main_process" subroutine multiple time given parameter sets (generated with gen_param subroutine). Dont' be overwhelmed with my code below. You can ignore much of it. The out of memory message only occur after it completes the first set of parameters, and then it breaks. See the last portion of "main_process" subroutine. Really hope to hear from you again. Read more... (14 kB)
--- neversaint and everlastingly indebted....... | [reply] [d/l] |
by BrowserUk (Patriarch) on Feb 22, 2006 at 14:09 UTC | |
Phew! Where to start :) I can't tell you exactly what is blowing your memory, without the sp_neg_eff and suitable data files it's impossible for me to run it. Overall, there are several routines that I don't have sight of, lots of loops and lots of arrays. It is very difficult to try and work out where the problem may lie simply by inspection. Nothing has leapt off the page at me as being the obvious cause. It may simply be a cumulative affect of all those loops, arrays and hashes finally consumes your ram completely. However, I can possibly point out some stuff that is certainly not helping either your memory consumption or your processing time. This may come in dribs and drabs as I wrap my brain around your code. I'll keep it all in the one post and /msg you if I update it. Example 1. Your getSeqfromfasta2lmers routine is doing way more work, and using 3 times as much memory as is necessary.
You are calling this routine once outside your main loop and then again inside it each time around the loop.
As the name of the file never changes, you are re-reading this file many times. And as far as I can tell, you are never modifying the contents of the array, so this is just a waste of cycles. As constructed above, the routine read the sequences one at a time and pushes them to a local array. You then return this array as a list to the caller where they are assigned to another array, or in the case of the first call, they are simply counted and then discarded. A quick look at the docs for Bio::SeqIO shows that it has a method specially designed for reading all the sequences from a fasta file. Namely ->newFH(). I don't get the logic behind the name, but the use is simple. The following is an (almost) drop in replacement for your version, though it will require a couple of other changes in your code, but they would be best changed anyway.
And then call the routine ONCE at the top of the program, assign the sequences into an array, and the re-use that array each time around the main loop.
And delete the following two lines from the top of the main sub
What if any difference these changes will make to your overall problem I'm not sure, but they will not harm. It's not very pc to pass data to subs via closure this way, but it avoids messing with references, and you are already passing (too) many arguments to that sub as it is. Eample 2. Equally, your routine
Could be recoded as
I've no idea how big those intermediate arrays gets, but they are not helping you in any way. Example 3: This won't affect your memory, but it made it easier for me to work out how many time the main loop/sub iterates. You don't need to assign list to arrays in order to use them in foreach loops. Now it is obvious that there are twelve anonymous hashes being returned by the sub.
There are also a couple of places where you are assigning array references to hash elements like this:
In both cases the array are local, and you will save some space by simply doing
Example4. Then there are oddities like you are accumulating the returns from the main sub in a hash:
But the main routine doesn't return anything?
It will probably be intensely frustrating to you, for me to say that you are going to have to try and simplify your code before you will be able to track down the cause of your problem. I doubt these changes individually or collectively will have any great effect on your memory consumption, but they may help you clean up the code enough to let you see where the real problem lies. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by neversaint (Deacon) on Feb 22, 2006 at 14:25 UTC | |
they may help you clean up the code enough to let you see where the real problem lies.Dear BrowserUk, I don't know how to thank you! I have learnt so much from your latest reply. Just realized I'm still not as good as I thought I am in coding Perl --- by far. Also I really appreciate your offer to look over my overall code (sp_neg_eff.pm) and the dataset. I wouldn't dare to cause you more trouble after such kindness. I guess by cleaning up my code above I can figure out what's the problem.
--- neversaint and everlastingly indebted....... | [reply] |