Sorry, but you are still couching this is terms I don't really understand. They may be second nature to you, but the terms, "distances, "N-terminals", "main chains" etc. all have a meaning to me, but not one that makes any sense in this context. You should try to phrase your questions in terms people without your specialist knowledge can understand if you are to get a good response.

Every atom name breaks down into 4 parts, which I already have broken up. The function atomName, simply returns the concatenation of all 4 parts.

Column 1 & 2 are the actual atomic element(1 or 2 letters, right-justified) except in the case of hydrogens, where column 1 is actually the number of the hydrogen.

Column 3 is the distance specified by a greek letter

Column 4, mostly empty, is the number of the heavy atom at the distance.

eg Leucine has two 'CD's so they are CD1 and CD2. the hydrogens respectively are: 1HD1, 2HD1, 3HD1, 1HD2, 2HD2 and 3HD3.

What this tells me is that you have the four parts that decide the sort order already separated, but that you are concatenating these together in order to sort them. This results in a variable length enitity, where

I realise that I have exaggerated this somewhat. It is possible to infer (or negate the possibilities) from your description, butthere are still enough left to make it extremely difficult to know where to begin in trying to help you.

Suffice to say, if the rest of your application could be bent to handling the attoms as arrays or hashes where the parts are separate, it would make the sorting easier. Never the less, I *think* that this would do the trick. It's a fairly standard ST in its basic layout, but the devil is in the details. The values used for the comparisons are numerical weights which are calculated in the first mapping.

The awkward part was n coming up with a mapping that spread the possible variations and combinations into a number space such that the result met your requirements. It uses two hash table lookups. The first %mainAtoms establishes the basic ordering in the range 1 .. 9. The second, %distances using floating point multipliers to map N[BGDEZH] to 10.0 .. 10 .5;  CA[BGDEZH] to 20.0 .. 20.5 and so on upto 5H[BGDEZH] to 90.0 .. 90.5. The third, numeric element of the atomNames is divided by 100 and added, thus providing the final arbitration.

#! perl -slw use strict; ## Lookup table to map main atoms to a numerical value my %mainAtoms = ( '' => 0, N => 1, CA => 2, C => 3, O => 4, H => 5, '2H' => 6, '3H' => 7, '4H' => 8, '5H' => 9 ); ## Lookup table for "distance" multiplier ## Using '' => 1 ensures that unadorned main atom weights ## remain in the range 1 to 9. ## Using 10.n for the distance weights ## maps the weights to 10.0 .. 10.5 for N, ## 20.0 .. 20.5 for CA etc. my %distances = ( '' => 1, B => 10.0, G => 10.1, D => 10.2, E => 10.3, Z => 10.4, H => 10.5 ); ## Some test data. my @unordered = qw[ 2HB 3HB C CA CB CG CD1 CD2 CE1 CZ CE2 HE2 HE1 HH HD1 HD2 N O OH ]; ## The following is a 'standard' Swartzian Transform ## You have to read the blocks backwards to understand the process. my @sorted = map{ ## This just maps the original value back ## from the anonymous array created below $_->[ 1 ] } sort { ## This sorts the anonymous arrays according to ## the numerical value in element 0 of the Anon. arrays ## This is the weight calculated below $a->[ 0 ] <=> $b->[ 0 ] } map { ## The first part of the transform extracts the 3 fields ## from the catenated atomName into $1, $2, $3 or dies if it fails m[ ( N | CA | O | C | (?: \d?H ) ) ( [BGDEZH] )? ( \d )? ]x or die "Failed to separate '$_'"; ## This builds the anon. arrays. The atomName is in ->[ 1 ] ## The calculated weight is in ->[ 0 ] [ $mainAtoms{ $1 } ## 1 .. 9 * $distances{ $2 || ''} ## 1 or 10.x + ( $3 || 0 )/100 ## 0 or 0.0n , $_ ## The atomName ] } @unordered; ## The unordered data. ## Display the results print join ' | ', @sorted; __END__ P:\test>295668 N | CA | C | O | CB | CG | CD1 | CD2 | CE1 | CE2 | CZ | OH | HD1 | HD2 | HE1 | HE2 | HH | 2HB | 3HB

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.


In reply to Re: Re: Re: sorting according to greek alphabet in roman letters by BrowserUk
in thread sorting according to greek alphabet in roman letters by seaver

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.