in reply to Efficient Data Comparison

I hate to deflate your ego, but you're dealing with a tiny amount of data, assuming a modern machine ( less than 5 years old ) equipped with modern main memory ( 256 MB? 512 MB? 1 GB? ).

25,000 records, at (pessimistically) 100 bytes per value, takes up 2.5 MB, or 1% or a small memory space.

Some people read multi-megabyte log files into a hash for post-processing. You should have no problems at all.

--
TTTATCGGTCGTTATATAGATGTTTGCA

Replies are listed 'Best First'.
Re^2: Efficient Data Comparison
by Anonymous Monk on Dec 20, 2005 at 20:14 UTC
    Thanks for all the responses. I actually need to compare data points not only across variables but across periods of time as well.

    For example: Divide the data point for variable x on 1/1/2005 by the data point for variable x on 1/30/2005 and compare that to the same metric for variable y. I need to iterate through this algorithm for each day in the database.

    I will therefore need to essentially constantly be polling the database for the datapoints associated with a particular day. I'm thinking that it is going to be much more efficient to load it all into hash structures at the outset and then just run a loop on those.

    Do I need to have a separate hash structure setup for each variable (i.e. %variableone, %variabletwo, etc.) in order to access the information? (code example: $datapoint = $variableone{'1/20/05'). Is there some way I could setup a matrix like data structure in perl that would mimic an sql structure (code example: $datapoint = $data{'1/20/05'}->variableone )

    Thanks.
      It isn't necessary to keep a separate hash variable. That's your choice. They can be in one hash ref, e.g. $href->{'01/01/2005'}->{"variable1"} = $val1 or $href->{'01/30/2005'}->{"variable2"} = $val2

      -imran