This is a problem I tackle daily (in Python). After I had implemented the two-way compare you mentioned, I longed for the remaining two classes of elements as well, and ended up with four classes of entries in two lists :
- Elements that are in both lists and are equal
- Elements that are in both lists and are not equal
- Elements that are only in list A
- Elements that are only in list B
In your problem, the second class would always be empty, as you have only the name of a file. If you would, for example also add the size of a file to be compared, the name of the file would still be the unique identifier, but two files could have the same name and still differ in the size.
I looked around, but there is no module for comparing two arrays and dividing them up into the four classes in Perl (there also was nothing comparable in Python, but with Python, I'm used to writing my own stuff :-)).
So here is my algorithm of how I do this :
- We need a method to extract some semblance of key from each item
- Since we will not know whether we will have a unique key for each element, we must compare bags of items that share the same key. We can select any (i.e. the first) item from a bag to pair.
- Read all items from list A, and put them in a hash. The key is the key extracted for each item. As this key is not unique, the hash entries will be lists of items.
- For each item in list B, look in the hash :
- If there is an item in the hash entry, remove that item and put both items in the "found" part of the result list.
- If there is no item in the hash entry, put the item into the "only in list B" part of the result list.
- Put all remaining items from the hash into the "only in list A" part of the result list.
- Now divide the "found" part up into "equal" and "different" parts by comparing the items closer.
- Return the "equal","different","only in list A" and "only in list B" parts.
Work interferes, so I won't write up the implementation - watch this space for an update
Update:(untested though)
=pod
extract_key takes an element from a list and returns a scalar that is
+the key element. Think of MD5.
=cut
sub extract_key {
# blindly return the item itself, stringified.
return "@_";
};
sub compare_items {
# plain string identity comparision
$_[0] eq $_[1]
};
sub compare_lists {
my ($list_a, $list_b) = @_;
my (%dict);
my %result = (
equal => [],
different => [],
only_a => [],
only_b => [],
);
for my $item_a (@$list_a) {
my $key = extract_key($item_a);
$dict{$key} = []
unless exists $dict{$key};
push @$dict{$key}, $item_a;
};
my @found;
for my $item_b (@$list_b) {
my $key = extract_key($item_b);
if (exists $dict{$key}) {
if (@$dict{$key}) {
my $item_a = shift @$dict{$key};
push @found, [ $item_a, $item_b ];
} else {
push $result{only_b}, $item_b;
};
};
};
push $result{only_a}, @$dict{$key}
for my $key (keys %dict);
for my $pair (@found) {
if (compare_items( $pair->[0], $pair->[1] )) {
push @$result{equal}, $pair;
} else {
push @$result{different}, $pair;
};
};
return %result;
};
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The
$d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider
($c = $d->accept())->get_request(); $c->send_response( new #in the
HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.