Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Iteration speed

by BrowserUk (Patriarch)
on Jun 15, 2004 at 22:42 UTC ( [id://367076]=note: print w/replies, xml ) Need Help??


in reply to Iteration speed

Describing your problem in terms that only another biochemist will understand means that most of us here will only be able to guess at what your program needs to do.

  • What do co-ordinates for mulit-chain proteins look like?
  • What does an interaction between each of the residues look like and how do you calculate it?

The best way to speed up iteration is to avoid iterating. Lookups are fast.

If your dataset is too large to fit in memory forcing you to re-read files, then the first pass I would make is to avoid having to re-parse the files each time. A pre-processing step that parses your files into convenient data structures and then writes these to disk in packed or Storable binary format would probably speed up the loading considerably.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^2: Iteration speed
by jepri (Parson) on Jun 16, 2004 at 13:05 UTC
    Oh, there's a few of us around :)

    The problem, as noted by others, is that we can't see his code to make suggestions. Shrug. Can't help much there. He doesn't even say if he's using the Perl bioinformatics modules or if he's rolled his own.

    In any case though, this is a problem that is begging for a parallel processing solution. In general, I'd recommend he break up the dataset and run it on all the machines in the lab. I doubt that there are many algorythmic improvements that can beat adding another 5 CPUs to the task.

    ___________________
    Jeremy
    I didn't believe in evil until I dated it.

      I know there are a few of you guys around, but the description left me (and a few others from the responses) completely cold :)

      Belatedly, I have begin to think that this problem is related to a previous thread. If that is the case, I think that an algorithmic approach similar to that I outlined at Re: Re: Re: Processing data with lot of math... could cut the processing times to a fraction of a brute force iteration. As I mentioned in that post, my crude testing showed that by limiting the comparisons to a fraction of the possibles using an intelligent search I can process 100,000 coordinates and find 19000 matching pairs in around 4 minutes without trying very hard to optimise.

      I agree that a distributed search would perform the same task more quickly but the additional complexity of setting up that kind of system is best avoided if it can be. And if this is the same problem, that is easily possible. My test code from the previous thread is under 60 lines including the benchmarking.

      What stops me offering that code here is 1) A clear decription of the problem. 2) Some real test data.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://367076]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-04-24 06:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found