Re: Contemplating some set comparison tasks

Honestly, my gut reaction is to use a database to do this, especially since you say you may have to repeat/scale in the future.

Use Perl to get the values into a DB, then run queries to trim things down the way you want...

Is a DB out of the question for some reason?

...the majority is always wrong, and always the last to know about it...

Insanity: Doing the same thing over and over again and expecting different results...

A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct

Comment on Re: Contemplating some set comparison tasks

Replies are listed 'Best First'.
Re^2: Contemplating some set comparison tasks by dwhite20899 (Friar) on Aug 08, 2014 at 19:26 UTC
It's in a postgreSQL DB actually. I'm less familiar with SQL, so the SELECT DISTINCT's don't scare me, but I get a little spooked at GROUP BY's and I fear doing some kind of JOIN on this table and itself. If I can come to grips with the concept I want to accomplish - and Perl is the mechanism I'm hoping I can do this with the easiest - I ought to be able to work out the SQL solution, and then you're right, the DB engine probably would be best.	[reply]
Re^3: Contemplating some set comparison tasks by Anonymous Monk on Aug 08, 2014 at 19:40 UTC
You said you're looking for something efficient - for such a relatively large data set, having the DB do the work should almost always be more efficient than reading the data into Perl and doing the work there. If you're worried about getting the queries right, how about setting up some test cases? A few minimal sets of data for which you can figure out the desired output "by hand" for the various queries you want to run, and then work on the queries until they match the desired output.	[reply]
Re^4: Contemplating some set comparison tasks by dwhite20899 (Friar) on Aug 08, 2014 at 19:45 UTC
Yeah, I think I can set up a table with a set, and make sure it has data that will yeild results that I could understand, based on the SQL. Dang, I should've thought to do that. Thanks!	[reply]