in reply to Best way to store/access large dataset?

So the first file is just a lookup of the (upto 200) names for each column; and the second file has one column per name and each line contains 1 boolean attribute for each item and there can be 1 million attributes?

And you want to basically count the number of true attributes for each item?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
  • Comment on Re: Best way to store/access large dataset?

Replies are listed 'Best First'.
Re^2: Best way to store/access large dataset?
by Speed_Freak (Sexton) on Jun 22, 2018 at 16:08 UTC

    Yes, and sum the attribute counts by the assigned category from the first file.

    So if I have 200 items spread equally across 5 categories, I need a sum value for each of the 1 million attributes as they appear in the items grouped by category. So attribute 1 was found 6 times in the 40 items listed as category 1, 12 times in the 40 items listed as category 2...etc, for each category and each attribute.