in reply to looking at everything
The basic algorithm you're looking for is:
my @list_of_stuff = get_my_list(); foreach my $i (0 .. $#list_of_stuff) { foreach my $j (0 .. $#list_of_stuff) { next if $i == $j; do_compare($list_of_stuff[$i], $list_of_stuff[$j]); } }
This algorithm will be very slow, especially if you're comparing more than 15-20 things. Remember, you're doing N * (N - 1) comparisons. So:
Things Comparisons 2 1 3 6 4 12 5 20 10 90 15 210 20 380 25 600 50 2450 75 5550 100 9900
It might be useful to do this kind of comparison on subsets of your data, then look at comparing typical items from each subset. So, if you could break 100 items down into 10 subsets of 10, then compare the typical item from each subset with each other, you reduce 9900 comparisons to 990. That's a 90% savings in time - both for the computer and for you as the user. (Remember, you are the one that has to deal with these comparisons.)
Of course, using another program to wade through the comparisons and discard the uninteresting ones can also be handy. I've done that many times. Where I work right now, we have a process that generates a set of logs. I have several scripts that do analysis on those logs and double-check the process's work. I even have a script that analyzes the results of the log analyzers. :-)
As to your second question - depending on the size of the things you're working with, you might not have enough memory to read everything in. Often, keeping those things on disk and reading them in when you want to deal with them is the appropriate thing to do. You might have to read things in over and over, but that's ok.
------
We are the carpenters and bricklayers of the Information Age.
The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
|
|---|