I think that it is possible to do this with one loop, and almost no math!

However, I could be missing something, so read, digest, cogitate and draw your own conclusions.

For example, if you have an atom located at point ( 10, 10, 10 ), and the cut-off distance is 1 unit, then any atom that will be paired with this atom will have to be somewhere within an box defined by the points

p( 09, 09, 09 ) p( 09, 11, 09 ) p( 09, 11, 11 ) p( 09, 11, 11 ) p( 09, 11, 11 ) p( 11, 09, 09 ) p( 11, 09, 11 ) p( 11, 11, 11 )

So, instead of storing the x:y:z in separate arrays, or some other complex data structure, create a single array, indexed by the concatentations of x:y:z, (with leading zeros).

$atoms[ 090909 ] = 'name1'; $atoms[ 091109 ] = 'name2'; $atoms[ 091111 ] = 'name3'; $atoms[ 091111 ] = 'name4'; $atoms[ 091111 ] = 'name5'; $atoms[ 110909 ] = 'name6'; $atoms[ 110911 ] = 'name7'; $atoms[ 111111 ] = 'name8';

Once you have created this (sparse) array, it becomes very easy to pick out the limited set of possible pairings for any given point. It requires a final math check to exclude those point lying in the corners of the box, but the number of calculations should be significantly less than the brute force method.

There are some problems with this.

  1. I'm using integer coordinates, yours may well not be.

    If so, I would use a scaling factor upon each of the three coordinates to bring them to some manageable range of integers, and store the real coordinates along with the name. I would do this as a single string rather than as an array ref, as it costs very little extra memory to store a longer scalar in an array element, but creating thousands of small arrays costs lots.

  2. Depending upon how easily your coordinates map to integer ranges, and the size of those ranges, the size of the array will likely be enormously sparse, and too big for memory.

    In which case it makes sense to use a hash to implement a sparse array. The downsides of this is that the hash takes substantially more memory, and you then need to sort the hash keys prior to processing them.

    Depending upon how meany thousands of atoms we are talking about, this could well lead you to requiring more memory than is available, or even possible (assuming a 32-bit processor).

At this point the solution becomes to process the files and create a single output file where each record is of the form

xxyyzz name x.xx y.yy z.zz

A single pass to create the file. Then sort the file (using your systems sort utility) and the a single pass to read the sorted output and do the final comparisons and write out the pairs in whatever format you need them.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

In reply to Re: Processing data with lot of math... by BrowserUk
in thread Processing data with lot of math... by qhayaal

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.