in reply to Delete from array

This is a problem I tackle daily (in Python). After I had implemented the two-way compare you mentioned, I longed for the remaining two classes of elements as well, and ended up with four classes of entries in two lists :

In your problem, the second class would always be empty, as you have only the name of a file. If you would, for example also add the size of a file to be compared, the name of the file would still be the unique identifier, but two files could have the same name and still differ in the size.

I looked around, but there is no module for comparing two arrays and dividing them up into the four classes in Perl (there also was nothing comparable in Python, but with Python, I'm used to writing my own stuff :-)).

So here is my algorithm of how I do this :

  1. We need a method to extract some semblance of key from each item
  2. Since we will not know whether we will have a unique key for each element, we must compare bags of items that share the same key. We can select any (i.e. the first) item from a bag to pair.
  3. Read all items from list A, and put them in a hash. The key is the key extracted for each item. As this key is not unique, the hash entries will be lists of items.
  4. For each item in list B, look in the hash :
    • If there is an item in the hash entry, remove that item and put both items in the "found" part of the result list.
    • If there is no item in the hash entry, put the item into the "only in list B" part of the result list.
  5. Put all remaining items from the hash into the "only in list A" part of the result list.
  6. Now divide the "found" part up into "equal" and "different" parts by comparing the items closer.
  7. Return the "equal","different","only in list A" and "only in list B" parts.

Work interferes, so I won't write up the implementation - watch this space for an update

Update:(untested though)

=pod extract_key takes an element from a list and returns a scalar that is +the key element. Think of MD5. =cut sub extract_key { # blindly return the item itself, stringified. return "@_"; }; sub compare_items { # plain string identity comparision $_[0] eq $_[1] }; sub compare_lists { my ($list_a, $list_b) = @_; my (%dict); my %result = ( equal => [], different => [], only_a => [], only_b => [], ); for my $item_a (@$list_a) { my $key = extract_key($item_a); $dict{$key} = [] unless exists $dict{$key}; push @$dict{$key}, $item_a; }; my @found; for my $item_b (@$list_b) { my $key = extract_key($item_b); if (exists $dict{$key}) { if (@$dict{$key}) { my $item_a = shift @$dict{$key}; push @found, [ $item_a, $item_b ]; } else { push $result{only_b}, $item_b; }; }; }; push $result{only_a}, @$dict{$key} for my $key (keys %dict); for my $pair (@found) { if (compare_items( $pair->[0], $pair->[1] )) { push @$result{equal}, $pair; } else { push @$result{different}, $pair; }; }; return %result; };
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Replies are listed 'Best First'.
Re: Re: Delete from array
by dragonchild (Archbishop) on Aug 07, 2003 at 13:21 UTC
    (there also was nothing comparable in Python, but with Python, I'm used to writing my own stuff :-))

    In my mind, that would be a reason to avoid Python. *shrugs* TMTOWTDI means different languages too, I suppose.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.