Re: How can I improve the efficiency of this very intensive code?

I feel the use of Hash might not be required for your task. You have recordID which can act as an index to an array so why put them in a hash and mess up the order? You get linear access in array using index anyways an no overhead of the hash-table

That said, i would do a Matrix(nxn) (square not a requirement, dimensions might change based on num of records of course) to keep track of scores. Consider the following table

File1rec/File2rec 1 2 3 4 5 6

1 3 7 8 9 9 10

2 4 8 3 1 1 6

3 1 4 9 4 9 7

4 4 3 10 7 2 3

5 4 2 5 9 9 5

6 5 6 2 5 6 9

The values inside the cell are the scores. Now if you want the best matching score (max value) then a O(n) max will provide you the answer for your records and you have to do that n-times for each record in your first file.

Sorting to finx max/min is an overkill. I might be missing your porblem so please correct me if i am wrong.

cheers

Comment on Re: How can I improve the efficiency of this very intensive code?

Replies are listed 'Best First'.

Re^2: How can I improve the efficiency of this very intensive code?
by clearcache (Beadle) on Aug 06, 2005 at 21:41 UTC

I was thinking about the use of arrays...my ids are pretty big numbers so I wouldn't use them alone as array indices. I could always use $., however, when I read in the file rather than the id.

My ranking is based on # of seconds from last log entry in one file to first log entry in the second file. So I create scoring by looking at # of seconds between each record. My ability to identify a "strong match" comes from the rate of concurrent users in the application that my data comes from. Low concurrent users, I'll have lots of strong matches - records that clearly line up. If I have high concurrent users with lots of log file entries, then I've got to get a little creative.

I was sorting b/c my hash is being used to store # of elapsed seconds...not a true "rank" in terms of 1, 2, 3, etc.

I'm considering the use of arrays, but don't want to lose the elapsed seconds as data quite yet b/c that will be used in the next step to figure out the best match from the remaining data.

[reply]


Perl Monk, Perl Meditation
	PerlMonks