I reworked this program and significantly improved performance. There were some mysterious discrepancies in the result set between the old version and the new on one run, but I believe I have those 'figured out.'
Partial profiles of new and old follow. I am cautiously considering this a success:
First version of program:
time elapsed (wall): 1473.9343 time running program: 1473.2193 (99.95%) time profiling (est.): 0.7150 (0.05%) number of calls: 59722 %Time Sec. #calls sec/call F name 92.33 1360.2230 2427 0.560454 DBI::st::execute 3.64 53.5727 2027 0.026430 main::process_x 3.58 52.7029 2007 0.026260 main::process_y 0.15 2.2193 1 2.219282 Term::ReadKey::ReadLine 0.10 1.4189 0 1.418933 * <other> 0.06 0.8885 24294 0.000037 DBI::st::fetchrow_array
Revised program:
time elapsed (wall): 408.6156 time running program: 408.2747 (99.92%) time profiling (est.): 0.3409 (0.08%) number of calls: 32883 %Time Sec. #calls sec/call F name 70.21 286.6553 510 0.562069 DBI::st::execute 24.20 98.7912 4034 0.024490 main::process 4.79 19.5629 1 19.562895 Term::ReadKey::ReadLine 0.27 1.1126 0 1.112580 * <other> 0.16 0.6666 20460 0.000033 DBI::st::fetchrow_array
NOW ON TO THE ORIGINAL POST ...
Good morning Monks -
The poet Charles Olson once wrote, memorably:
I have had to learn the simplest things
last. Which made for difficulties.
This kind of sums up my situation vis-a-vis Perl, I think. I have been flummoxed for the past few days: my lack of substantive CS background has (once again) been chewing a hole in my ... er, back.
This post is in a sense a followup to my earlier post about profiling, and yet isn't about DBI at all, but more about data structures.
I have found that I can essentially grab ALL the data I need to process (for the task outlined in the previous post) with ONE database call per line of input. What comes down from that series of calls looks like this:
21 DET-2 896.657564735788 678.83860967799 21 DET-3 32.0939023018969 621.656550474314 21 DET-3 42.0741462550974 834.842294892622 21 DET-3 218.814294809857 450.606540154849 21 DET-3 228.88830316475 625.939190221948 21 DET-3 630.472705847461 220.839350101088 21 DET-5 152.988115061449 156.31861287082 21 DET-5 730.997702224652 507.421683707195 21 DET-6 506.364456847517 587.275663167673 21 DET-6 573.109998216762 116.126667780714 21 DET-6 885.306844616344 411.352928714465 21 DET-6 959.150025915228 845.316911114704 21 DET-7 62.7170088137102 593.424801945024 21 DET-7 110.245168119381 788.219885220784 21 DET-7 159.254569896235 386.365906980404 21 DET-7 377.53529067825 163.659365696494 21 DET-7 736.734267414092 129.235251032426 21 DET-7 836.081539763363 401.860540038111 21 DET-8 736.566372536132 247.410290038796 47 DET-7 189.488040387042 500.316501378612 47 DET-7 251.972954527148 519.649226713148 71 DET-7 188.133043154801 499.94217650742 71 DET-7 251.06636137579 519.007465693828 88 DET-0 0.70684189743067 391.883292824418 88 DET-0 114.871177986263 212.959076023136 88 DET-0 219.421725079137 710.314439572696 88 DET-0 257.837516726887 594.376577764894 88 DET-1 119.630462310966 260.433234269099 ...
In each line, the first value is an "observation number," the second a "detector number" and the third and fourth values are the x and y coordinates of actual "hits" on the detectors.
I have edited some in this sample of the roughly 19,000 lines but wanted to leave enough to show that:
So I have been facing the roaring Godzilla that is my lack of experience with data structures, and trying to figure out what might be the best structure I could put this in for processing ...
My first attempt was a hash of arrays, which yielded something like this ...
21 => DET-2, 896.657564735788, 678.83860967799, DET-3, 32.0939023018969, 62 +1.656550474314, DET-3, 42.0741462550974, 834.842294892622, DET-3, 87. +5412177704422, 684.850417188863, DET-3, 92.9823463716063, 216.3390205 +94075, DET-3, 175.151394732114, 525.441189179707, DET-3, 218.81429480 +9857, 450.606540154849, DET-3, 228.88830316475, 625.939190221948, DET +-3, 630.472705847461, 220.839350101088, DET-5, 152.988115061449, 156. +31861287082, DET-5, 730.997702224652, 507.421683707195, DET-6, 784.60 +8063532865, 688.699410601935, DET-6, 885.306844616344, 411.3529287144 +65, DET-6, 959.150025915228, 845.316911114704, DET-7, 62.717008813710 +2, 593.424801945024, 47 => DET-7, 189.488040387042, 500.316501378612, DET-7, 251.972954527148, 5 +19.649226713148, 71 => DET-7, 188.133043154801, 499.94217650742, DET-7, 251.06636137579, 519 +.007465693828,
... note: this data may not quite agree with that above, I am cutting for clarity and this is mostly for illustration purposes.
But it at least looks like this is not processed enough, because of those repeated "DET" values, and that what I really want is to "deepen" the structure one more level, to "pull out" as it were the detector numbers. And it is here that I get stuck, both in terms of "what would be best" and "how do I do that?"
Even perldsc only goes so far in terms of complexity.
At first I thought "it must be a hash of hashes of arrays that I want," and I uncovered this node showing how to create such a thing. BUT to be quite honest, I didn't or couldn't or can't or currently am not able to truly grok the solutions presented at that node. And IS this the best structure for me?
So my questions, I am afraid, are three, which is perhaps a function of the lack of clarity in my thinking:
Apologies for the length of this post. I hope there is something of interest in it. I am, once again, feeling stuck and frustrated. I know its no one's responsibility to help me out of my thought ditch, but if anyone has any maps to recommend, I would be grateful.
Regards,
An extremely humble Monk
In reply to structuring data: aka walk first, grok later by chexmix
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |