Re: Data Set Combination

Do the rows always correspond one-to-one, except in the case where data is missing from one or the other?

If so, a solution is straightforward. You'll find elements in the first hash but not the second as you iterate over it. If you delete elements from the second hash as you find them, anything left in the second hash will be items that weren't in the first hash.

If there can be a many-to-one correspondence between the two, you can pull the first array into a hash with selectall_hashref, find elements in the first but not the second while iterating over the first array, then iterate over the second array to search for items which aren't in the first.

You can also have the SQL server do most of the work for you by JOINing the tables in an appropriate way. For example, a SELECT ... FROM table1 LEFT JOIN table2 ... will return all elements of table1, where possible joining the data from table2, and where not replacing it with null. SELECT ... FROM table2 LEFT JOIN table1 WHERE ... AND table1.something IS NULL will return all items in table2 with no corresponding row in table1. Maybe a better SQL hacker than I would know one query to efficiently do the whole thing.

Comment on Re: Data Set Combination Download Code

Replies are listed 'Best First'.
Re^2: Data Set Combination by Anonymous Monk on Aug 26, 2005 at 20:07 UTC
No, the data sets are from different markets, with different values. What I want to do is create a third data set by performing calculations on the values of each set and putting the result in the new data set. If a row exists in one but not the other, I would simply take the previous row of the non-existing row's data set and perform the calculations using those values. So, joining the values via SQL statements is not what I am looking for. Thanks for your input though.	[reply]
Re^3: Data Set Combination by sgifford (Prior) on Aug 26, 2005 at 20:21 UTC
Ah, I guess I don't understand your problem then. So you're saying if you have a missing day in one of your data sets, you want to reuse the previous day's data for calculations on that row? If so, sort the two data sets by date, then move through both in parallel. Track the "current row" and "next row" for both data sets, and loop over a series of dates. For each data set, if the date of "next row" is equal to the date you're currently looking at, set "current row" to "next row" and read another item into "next row". Then perform calculations based on the each data set's "current row", which will either contain the data from the current day or from the last day for which data is available. How to deal with a missing first row would have to be a special case. Hope this helps; no time to write up any sample code right now.	[reply]