Re^2: structuring data: aka walk first, grok later

I'm gonna try to keep this short b/c it seems I get into trouble when I "yammer on".

In re: the non-integer values for x and y. The full explanation is mathematically forbidding but has to do with the nature of the detecting instrument: coordinate systems are actually transformed somewhat during data processing.

But to cut it short: in my first version of the program I converted these values to integers anyway (a sanctioned move - I didn't just decide to do that on my own). I muddied the issue here by posting the full precision values in this post.

In re: what I am trying to do. Essentially I am trying to find places on the detectors where 'hits' represented by the pixel values seem to "bunch up." These 'hits' represent places on the detectors where photons have struck. A number of "hits" in x or y that goes over a predetermined value _may_ indicate something that needs to be looked at more closely (e.g. by human eyes).

The first version of the program took a list of observation sessions, represented by numbers, as input. For each observation session, it did a database call to find out which of the seven detectors/CCDs were involved.

THEN, for each detector in that observation, it did a database call to pull in the data for the "hits", populating an array for the x axis and one for the y axis of the detector.

Then it iterated over those built-up arrays for x and y, kind of doing a histogram in memory (repeat for each detector, then move on to the next observation) ...

I must emphasize: this approach worked. But it's apparently inefficient, especially in terms of time (total run time: 19 minutes) spent doing db calls. So I figured out how to pull all the data in first. This takes only 2 minutes.

All the lines of the lump are like this:

$observation, $detector, $x_coord, $y_coord

Now I keep getting stuck trying to get the big lump to do what I want:

... to give me an array of the x values and an array of the y values for a SINGLE detector in a SINGLE observation. And so on, through the lump, until I am done. I need to examine the DISTRIBUTION of values in x and y axes of each detector, in each observation, individually.

Maybe I should be satisfied with my 19 minute runtime, and leave the data munging / structures alone until I am more experienced ... ? I don't know.

Do I need a data structure? I don't know that either. It feels like I do, because without one I don't know how to "address" subsets of the lump of data.

I hope that's clearer, anyway. I don't know why I am so stuck, and I am sorry I am.

Comment on Re^2: structuring data: aka walk first, grok later

Replies are listed 'Best First'.
Re^3: structuring data: aka walk first, grok later by BrowserUk (Patriarch) on Jun 05, 2008 at 23:15 UTC
If I understand you correctly, then each datapoint (O,D,X,Y) represents one photon hitting a pixel(XY) of a detector (D) during an observation (O). And that pixel may be struck zero, one or many times during a given observation. If it is hit more than once, then there will be multiple, identical, (O.D.X.Y) datapoints in the dataset for that detector/observation pairing? Is that correct? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^4: structuring data: aka walk first, grok later by chexmix (Hermit) on Jun 06, 2008 at 01:39 UTC
No. Sorry again for my lack of clarity. I was told that each (x, y) pair is s/t that the system has recorded as a positive id, e.g. it represents a 'thing' that has been recorded as having been observed. But due to the nature of the instrument, some or many of these may be spurious: "streaks" on the detector, for example. Such things will show up as a "pileup" of points within a given window (so many pixels wide) in x or y.	[reply]
Re^5: structuring data: aka walk first, grok later by BrowserUk (Patriarch) on Jun 06, 2008 at 01:46 UTC
I was told that each (x, y) pair is s/t Great. An "s/t"!. Sorry. Trying to understand your problem has defeated me. I think that may be your problem too. You cannot hope to solve a problem until you understand it. And one usually sure way of checking your understanding is to try and describe it to someone unfamiliar with the field. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]


more useful options
	PerlMonks