comment on

Dear all

My problem is a bioinformatics problem, I'm currently running a script that is processing 13,000 files which contain co-ordinates for mulit-chain proteins

It calculates the interaction between each of the residues, between each chain (if there are any interactions).

Thus, it is iterating not only through each chain pair, but through every possible residue pair (though not residues in the same chain of course) and seeing if they are close enough, and determining the kind of interaction if so.

My problem, which I fear is unavoidable, as it is compounded by the fact that I cannot avoid to MISS any possible interactions, is that some of the larger files take hours, even more than a day, to process.

At this rate, it can take nearly a year, to go through all the files, which is unfortunate.

So take for example, a 6 chain protein, with approx 3000 residues, that's approx 500 residues per chain. So in one chain pair, there's 500x500 iterations, which is 250,000 iterations, and because there's 6 chains, that's 15 possible chain pairs (avoiding repeats eg: AB == BA) so thats .25 shy of 4 million iterations.

I just wanted to know, what are the potential bottlenecks? One such file (larger than the above example) is still being processed after 1.5 days!

The way my program runs, is that, while it reads in the file, for every new residue it reads in, it iterates through the list currently in memory (avoiding residues in the same chain of course) to look for new interactions, and at the same time, is populating a database with the atomic and residual details, and the interactions if any. Is this a stupid way of doing it?

Cheers
Sam

In reply to Iteration speed by seaver

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.