seaver has asked for the wisdom of the Perl Monks concerning the following question:
Lets say im doing this
The atom names, should be ordered due to either i)their distance from the N-terminal, or ii) distance from main chain. This falls into three rough categories:foreach my $atom (sort keys %residues{$residue}){ do something with $atom; }
Hence, for Tyrosine, I have these atoms, ordered correctly:
If the my intial loop returned the names randomnly, as hashes do, what should I to sort them? Ive made little attempt on this because I've never really understood anonymous subroutines, or inline subroutines, which is what you usually do with sort. But I imagine I should create a named subroutine that returns 1,0 or -1 according to my own criteria as listed above, am I right?N CA C O CB CG CD1 CD2 CE1 CE2 CZ OH 2HB 3HB HD1 HD2 HE1 HE2 HH
Cheers
Sam
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: sorting according to greek alphabet in roman letters
by BrowserUk (Patriarch) on Oct 01, 2003 at 17:42 UTC | |
Here's a simplistic mechanism that would probably benefit from using a GRT. It relies upon having a pre-ordered lookup table, here in the form of a simple space delimited string and the ordering is performed by the simple numerical comparision of the target position within the string.
However, I suspect that you would like to not have to pre-construct the ordering of all possible compounds but would prefer their "weight" to be calculated according to your set of rules. That would be possible to do using a lookup hash mapping the symbols to weights and the some math to multiply by the number of hydrogens (terminology?). To code this would require a clearer set of rules to work with. The only minor problem I see is parsing combined symbols -- do CA, CB, CD, CE, CZ break down into "C & A", "C & B"... etc. or are they intrinsic, but that is easily summountable. Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you. | [reply] [d/l] |
by seaver (Pilgrim) on Oct 01, 2003 at 19:42 UTC | |
Column 1 & 2 are the actual atomic element(1 or 2 letters, right-justified) except in the case of hydrogens, where column 1 is actually the number of the hydrogen. Column 3 is the distance specified by a greek letter Column 4, mostly empty, is the number of the heavy atom at the distance. eg Leucine has two 'CD's so they are CD1 and CD2. the hydrogens respectively are: 1HD1, 2HD1, 3HD1, 1HD2, 2HD2 and 3HD3. So weight can indeed be given. The highest weight would be given to atomnames that consist of the element only, N, C and O, and also to the 'first' Carbon, CA. Thereafter, it matters little which heavy atom, only the distance is important, so the second tier of weights will be given to the distance: B,G,D,E,Z,H Finally, the hydrogens are last, and their weights are threefold, their number followed by their distance, followed by their heavy atom number at that distance. I'd started to 'weigh' things within the sortAtom function mentioned below this way: my aim was to then do this: Is this what you're getting at browserUk?
Thanks | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Oct 01, 2003 at 23:19 UTC | |
Sorry, but you are still couching this is terms I don't really understand. They may be second nature to you, but the terms, "distances, "N-terminals", "main chains" etc. all have a meaning to me, but not one that makes any sense in this context. You should try to phrase your questions in terms people without your specialist knowledge can understand if you are to get a good response.
What this tells me is that you have the four parts that decide the sort order already separated, but that you are concatenating these together in order to sort them. This results in a variable length enitity, where I realise that I have exaggerated this somewhat. It is possible to infer (or negate the possibilities) from your description, butthere are still enough left to make it extremely difficult to know where to begin in trying to help you. Suffice to say, if the rest of your application could be bent to handling the attoms as arrays or hashes where the parts are separate, it would make the sorting easier. Never the less, I *think* that this would do the trick. It's a fairly standard ST in its basic layout, but the devil is in the details. The values used for the comparisons are numerical weights which are calculated in the first mapping. The awkward part was n coming up with a mapping that spread the possible variations and combinations into a number space such that the result met your requirements. It uses two hash table lookups. The first %mainAtoms establishes the basic ordering in the range 1 .. 9. The second, %distances using floating point multipliers to map N[BGDEZH] to 10.0 .. 10 .5; CA[BGDEZH] to 20.0 .. 20.5 and so on upto 5H[BGDEZH] to 90.0 .. 90.5. The third, numeric element of the atomNames is divided by 100 and added, thus providing the final arbitration.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you. | [reply] [d/l] [select] |
by seaver (Pilgrim) on Oct 02, 2003 at 16:50 UTC | |
|
Re: sorting according to greek alphabet in roman letters
by Paladin (Vicar) on Oct 01, 2003 at 17:30 UTC | |
I've never really understood anonymous subroutines, or inline subroutines, which is what you usually do with sort. But I imagine I should create a named subroutine that returns 1,0 or -1 according to my own criteria as listed above, am I right? You are right there. Create a sub that when given any 2 atoms, returns 1 if the first is larger than the second, 0 if they are equal, or -1 if the first is smaller than the second, for whatever definition of larger, equal and smaller you are using. (Note: use $a and $b for the first and second atom given to the sub, as in my example below). As for the inline/anonymous subs for sort, they are fairly simple. For example if you had: You could replace it with: or even: as a sub implicitly returns the value of the last statement evaluated. That's pretty much all there is to using an anonymous sub with sort. Of course, if you are using the same sort in more than 1 place, it would be better to make a named sub, that way you only have to change 1 thing if you decide to sort differently later. | [reply] [d/l] [select] |
by seaver (Pilgrim) on Oct 01, 2003 at 19:25 UTC | |
OK, where do $a and $b come from, are they implicit within 'foo' or are they from 'sort' itself. I ask because I have a slight OO problem, $a and $b in my case are actually unique reference numbers to atoms in: $self->{'atoms'}. The function I do the sort in is in the same object and has '$self' defined, but if the 'foo' function resides in the same object, how do I pass $self, or else access the object's hash in order to access the atom objects themselves... In other words, in foo, I really need to do this: Any pointers? Cheers Sam | [reply] [d/l] |
|
Re: sorting according to greek alphabet in roman letters
by qq (Hermit) on Oct 01, 2003 at 22:50 UTC | |
Sorting functions are not that hard - don't be afraid, read the docs, try a simple one or two. However the logic on something like this can get involved. I often like to transform the original value into a string that will sort easily using a very simple sort function. In this case, I transform all atoms into a string that will sort ascabetically. A regex matches the different components of the name. Then we look up the sort values from a couple of handy hashes. Fill in default values (no value for a component means sort early). Caveat: I know nothing about atoms, so I'm working purely from your description. I've two weighting hashes in the script. The one called 'main' doesn't correspond exactly to the 'main chain' bits described, but rather corresponds to the bit that isn't a greek or number part. Well, the explanatory paragraph may not make sense, but hopefully the code will be clearer:
Which gives:
| [reply] [d/l] [select] |
by qq (Hermit) on Oct 02, 2003 at 06:30 UTC | |
Woke up this morning thinking it could be much simpler. This time I do use a numeric sort:
Gives:
| [reply] [d/l] [select] |
by seaver (Pilgrim) on Oct 02, 2003 at 18:01 UTC | |
Cheers | [reply] |