Re^3: speeding up row by row lookup in a large db

Was the cell table composed of 100,000 or 1,000,000 rows? If 100,000 then most assuredly keep this in memory as a hash (i.e. read whole thing only once). That might not be a bad idea even if it's a million. And obviously do the same with the tiny lc and dist tables.

Consider storing the data in each table as a blob, but first pack it, and blob the packed entity. IIRC, sqlite keeps data as strings, so it's more voluminous than the actual data. And then you have the cost of all those atoi's to decode it.

And if your data can be organized by lat-long, consider a small database for each lat-long region.

Also, consider the CPAN module Cache::Memcached.

Comment on Re^3: speeding up row by row lookup in a large db

Replies are listed 'Best First'.
Re^4: speeding up row by row lookup in a large db by punkish (Priest) on Mar 23, 2009 at 02:09 UTC
So, the cell table has 1 million rows. Each cell has (in the current study), a 7300 rows by 9 columns table of weather data. There are 400 sets of weather data, so there is some level of normalization. I serialized and compressed the weather data (using Storable via Data::Serializer) and stored each set as a BLOB. Unfortunately, the speed of extracting the BLOB and de-serializing and uncompressing it is sufficiently high... about 28 ms per set, which is really slow. In fact, it is quicker to simply pull out 7300 rows from the table than to pull out one row of BLOB and unraveling it. I will look at Cache::Memcached. -- when small people start casting long shadows, it is time to go to bed	[reply]
Re^5: speeding up row by row lookup in a large db by BrowserUk (Patriarch) on Mar 23, 2009 at 03:27 UTC
So, the cell table has 1 million rows. Each cell has (in the current study), a 7300 rows by 9 columns table of weather data. There are 400 sets of weather data, so there is some level of normalization. Are you saying that each of the million CELL rows, references one of 400 sets of weather data (each of which is 7300x9)? And that for each CELL you wish to load the appropriate weather dataset? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^6: speeding up row by row lookup in a large db by punkish (Priest) on Mar 23, 2009 at 03:32 UTC
Are you saying that each of the million CELL rows, references one of 400 sets of weather data (each of which is 7300x9)? And that for each CELL you wish to load the appropriate weather dataset? yes. Fortunately not all at the same time, but I want to reduce the time to load all the data for _a_ cell. The time I am getting as of now is around 30 ms. Update: Because, in the future (next iteration of the project), the number of cells is going to jump to about 1.9 million, and the size of weather data set might increase (currently we have 20 years of data, hence 20365=7300 rows). Hence, all the focus on reducing the time to retrieve all the data for any given cell. -- when small people start casting long shadows, it is time to go to bed*	[reply]
Re^7: speeding up row by row lookup in a large db by BrowserUk (Patriarch) on Mar 23, 2009 at 03:59 UTC
Re^8: speeding up row by row lookup in a large db by BrowserUk (Patriarch) on Mar 23, 2009 at 05:30 UTC
Re^7: speeding up row by row lookup in a large db by BrowserUk (Patriarch) on Mar 23, 2009 at 13:26 UTC
Re^8: speeding up row by row lookup in a large db by etj (Priest) on Jun 03, 2022 at 22:10 UTC