Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: How can I improve the efficiency of this very intensive code?

by sk (Curate)
on Aug 06, 2005 at 21:23 UTC ( [id://481552]=note: print w/replies, xml ) Need Help??


in reply to How can I improve the efficiency of this very intensive code?

I feel the use of Hash might not be required for your task. You have recordID which can act as an index to an array so why put them in a hash and mess up the order? You get linear access in array using index anyways an no overhead of the hash-table

That said, i would do a Matrix(nxn) (square not a requirement, dimensions might change based on num of records of course) to keep track of scores. Consider the following table

File1rec/File2rec123456
13789910
2483116
3149497
44310723
5425995
6562569
The values inside the cell are the scores. Now if you want the best matching score (max value) then a O(n) max will provide you the answer for your records and you have to do that n-times for each record in your first file.

Sorting to finx max/min is an overkill. I might be missing your porblem so please correct me if i am wrong.

cheers

SK

  • Comment on Re: How can I improve the efficiency of this very intensive code?

Replies are listed 'Best First'.
Re^2: How can I improve the efficiency of this very intensive code?
by clearcache (Beadle) on Aug 06, 2005 at 21:41 UTC

    I was thinking about the use of arrays...my ids are pretty big numbers so I wouldn't use them alone as array indices. I could always use $., however, when I read in the file rather than the id.

    My ranking is based on # of seconds from last log entry in one file to first log entry in the second file. So I create scoring by looking at # of seconds between each record. My ability to identify a "strong match" comes from the rate of concurrent users in the application that my data comes from. Low concurrent users, I'll have lots of strong matches - records that clearly line up. If I have high concurrent users with lots of log file entries, then I've got to get a little creative.

    I was sorting b/c my hash is being used to store # of elapsed seconds...not a true "rank" in terms of 1, 2, 3, etc.

    I'm considering the use of arrays, but don't want to lose the elapsed seconds as data quite yet b/c that will be used in the next step to figure out the best match from the remaining data.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://481552]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-26 00:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found