Difference arrays.

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Difference arrays. by betterworld (Curate) on Sep 04, 2008 at 19:30 UTC
`my @p = (1,1,1,1,1,2,2,2,3,3,4,5,6); my @q = (1,2,3,4,5,6 ); # my @p = (43, 43, 44); # my @q = (43, 43); my %q; $q{$_}++ for @q; my @r = grep { --$q{$_} < 0; } @p; print join (',', @r), "\n";` [download]	[reply] [d/l]
Re^2: Difference arrays. by BrowserUk (Patriarch) on Sep 04, 2008 at 20:30 UTC
That's the one. I swear I tried that (at least twice!), but ... Many thanks++. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^2: Difference arrays. by Skeeve (Parson) on Sep 04, 2008 at 19:36 UTC
Similar idea like mine above , but much, much better than I did it! ++ and ++ could I vote twice! `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re^2: Difference arrays. by journey (Monk) on Sep 06, 2008 at 02:34 UTC
My goodness it took me a while to get that one! Other slow brains like mine can have a look at this grep tutorial (where hashes are incremented): http://www.hidemail.de/blog/perl_tutor.shtml Also have a look a the PM node Re: How do I un-map this code?	[reply]
Re^2: Difference arrays. by Skeeve (Parson) on Sep 04, 2008 at 21:27 UTC
Big bug Just found it when I was examining kyle's solution. swap p and q and your routine will fail! Sorry! Not a bug! My mistake! `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re^3: Difference arrays. by betterworld (Curate) on Sep 04, 2008 at 21:35 UTC
I noticed that your solution is symmetric because it has the `abs` in it. However I was taking BrowserUk by the word: "those in the first not in the second". Given the restriction "one a proper subset of the other", it does not matter, and our solutions print essentially the same elements (maybe in a different order).	[reply] [d/l]
Re^4: Difference arrays. by Skeeve (Parson) on Sep 04, 2008 at 21:38 UTC
Re: Difference arrays. by moritz (Cardinal) on Sep 04, 2008 at 19:29 UTC
Here's a not-quite-as-simple version that doesn't use a hash at all, and as little additional memory as possible. It exploits both the fact that one array is a proper subset of the other, and that the items appear in the same order in both arrays. If the latter is not the case you'd have to sort it, which kinda defeats the memory advantage. `use strict; use warnings; my @p = ( 1,1,1,1,1,2,2,2,3,3,4,5,6); my @q = ( 1,2,3,4,5,6 ); my ($px, $qx) = (0, 0); my @diff; while (1) { if ($qx >= @q){ push @diff, @p[$px .. @p-1]; last; } elsif ( $p[$px] == $q[$qx] ) { $px++; $qx++; } else { push @diff, $p[$px++]; } } print "p: @p\n"; print "q: @q\n"; print "d: @diff\n";` [download] Update: that code can be simplified a bit: `while ($qx < @q) { if ( $p[$px] == $q[$qx] ) { $px++; $qx++; } else { push @diff, $p[$px++]; } } push @diff, @p[$px .. @p-1];` [download]	[reply] [d/l] [select]
Re: Difference arrays. by ikegami (Patriarch) on Sep 04, 2008 at 19:58 UTC
Since you're concerned about memory, you could do something like a Merge Sort. Memory: O(1) (Not counting @a, @b and @c) Speed: O(A+B) (Assuming @a and @b already sorted. As good as the other solutions) `my @a = sort { $a <=> $b } (43,43,44); my @b = sort { $a <=> $b } (43,43); my @c; while (@a && @b) { if ($a[0] < $b[0]) { push @c, shift @a; } elsif ($a[0] > $b[0]) { die "Bad data"; } else { shift @a; shift @b; } } push @c, $_ for @a; die "Bad data" if @b;` [download] A trivial change makes it non-destructive. Update: Fixed bug mentioned in replies. Tested.	[reply] [d/l]
Re^2: Difference arrays. by GrandFather (Saint) on Sep 04, 2008 at 21:39 UTC
That works better if the pops are shifts. .oO(How many times have I been caught by that!) Perl reduces RSI - it saves typing	[reply]
Re^2: Difference arrays. by kyle (Abbot) on Sep 04, 2008 at 20:23 UTC
When I run this, I end up with... `@a = ( 43 ); @b = (); @c = ( 43 );` [download] None of those is the OP's desired result, `( 44 )`. Am I missing something?	[reply] [d/l] [select]
Re: Difference arrays. by kyle (Abbot) on Sep 04, 2008 at 19:52 UTC
Fun with DDT! Read more... (3 kB) My own solution pulled out of the <readmore>: `sub kyle { my ( $ref1, $ref2 ) = @_; my %h; $h{$_}++ for @{$ref1}; $h{$_}-- for @{$ref2}; my %x; return [ grep { $x{$_}++ < $h{$_} } @{$ref1} ]; }` [download] I notice that Skeeve's has the bug mine had before I tested it. I'm happily surprised at how many don't have the bug I was expecting when I wrote the `[ $r1, ... ]` test. I, for one, wasn't shooting to optimize memory usage. I just wrote the first thing I thought of.	[reply] [d/l] [select]
Re^2: Difference arrays. by Skeeve (Parson) on Sep 04, 2008 at 21:22 UTC
Okay� Mine has the bug that it stringifies the keys. agreed. yours has a bug too! It doesn't work in this case: `kyle( \@q, \@p );` [download] No! Mine has the bug in that it doesn't care for what BrowserUK asked for! `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re^3: Difference arrays. by kyle (Abbot) on Sep 04, 2008 at 21:41 UTC
(OK, so as I was writing this, the node I'm replying to was updated. I'll post anyway...) I don't know what you mean. You've passed in the input arrays in the opposite order? The OP specifies that the second array is to be a proper subset of the first. If you reverse them, that violates this condition. So what would you expect to get back in that case? Mine returns (a reference to) an empty array. The solution from betterworld gets the same thing. The two solutions that moritz posted go into an infinite loop (apparently—I didn't exactly wait that long). The only other working solution, pjotrik's, gets something else. Anyway, here's an updated test script... Read more... (4 kB)	[reply] [d/l]
Re: Difference arrays. by Skeeve (Parson) on Sep 04, 2008 at 19:28 UTC
Since it's you, here is my suggestion: Update: My "solution" is now shamefully hidden in readmore-tags because it doesn't give you what was asked for. My solution simply gives you (stringified) all elements whch are in the first or the second array but not in both. Read more... (596 Bytes) Wouldn't it be you, I would have cried H O M E W O R K! ;-) Update: Removed an overseen "my $d". Thanks to betterworld `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re^2: Difference arrays. by Anonymous Monk on Sep 04, 2008 at 19:53 UTC
"Examine what is said, not who speaks"	[reply]
Re^3: Difference arrays. by kyle (Abbot) on Sep 04, 2008 at 20:29 UTC
"Examine what is said, not who speaks" Hey, I think I've seen that in someone's signature! Um, anyway... I can't speak for the other monks, but I've found content analysis to be expensive enough to be worth optimizing. As such, it's expedient to apply a simple memoization/caching technique whereby the source of some assertion influences the time and effort I spend evaluating the assertion based on past experiences with that source. It's true that a stopped clock is right twice every day, and even kooks can have insights worthy of deep consideration. Still, I don't want to waste my time looking at a clock I already know is broken any more than I want to waste my time pondering the rantings of someone whose rantings I've already pondered at a length greater than their value. I'm not talking about anyone in particular here. I just find the "ignore the source" meme a little irritating. Heuristics have their place.	[reply]
Re: Difference arrays. by pjotrik (Friar) on Sep 04, 2008 at 20:22 UTC
Given the (very strict) restrictions (proper subset, sorted) , my solution would be: `sub pjotrik { my ($a, $b) = @_; my $i = 0; return [ map { if ($i < @$b && $_ == $$b[$i]) { $i++; () } else { +$_ } } @$a ]; }` [download] But the use of == makes it somewhat vulnerable. ~~ should improve that, but I have no experience with it. UPDATE: Note that as well as ikegami's solution, this is based on the idea of merging.	[reply] [d/l]
Re: Difference arrays. by pat_mc (Pilgrim) on Sep 04, 2008 at 22:34 UTC
Not sure if there is much point in submitting yet another solution. Still, here's my take on things: `my @a = ( 1,1,1,1,1,2,2,2,3,3,4,5,6); my @b = ( 1,2,3,4,5,6 ); my %d; my @difference; $d{ $_ } ++ for @a; $d{ $_ } -- for @b; for my $key ( keys %d ) { next if ( $d{ $key } <= 0 ); push @difference, $key for ( 1 .. $d{ $key } ); } print sort @difference;` [download] It is perspicuous and works. Comments? Pat	[reply] [d/l]
Re: Difference arrays. by dreadpiratepeter (Priest) on Sep 04, 2008 at 19:18 UTC
UPDATE: scratch that, didn't read close enough and look at the second example, my bad. wouldn't (off the top of my head): `my @a = (43,43,44); my @b = (43,43); my %h = map {($_=>1)} @a; delete @h{@b}; print join(",",keys %h);` [download] work? -pete "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."	[reply] [d/l]
Re^2: Difference arrays. by Skeeve (Parson) on Sep 04, 2008 at 19:35 UTC
work? No! Try it with the other example of the question. `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re: Difference arrays. by repellent (Priest) on Sep 05, 2008 at 02:38 UTC
Perhaps I'm missing something, but here's what I got: `my @a = ( 1,1,1,1,1,2,2,2,3,3,4,5,6); my @b = ( 1,2,3,4,5,6 ); my @c = map { my $found = 0; my $m = $_; for (1 .. @b) { my $n = shift @b; ++$found and last if $m == $n; push @b, $n; } $found ? () : $m; } @a;` [download] Update: OK, I think I got it to work. No hash here, so it may save more space at the expense of time.	[reply] [d/l]
Re: Difference arrays. by mr_mischief (Monsignor) on Sep 05, 2008 at 05:36 UTC
This should trade lower memory usage for potentially longer run times before accounting for swapping effects. This version assumes one can modify @a for even a bit more memory savings: `my @a = ( 42, 42, 43, 43, 43, 44, 45, 46 ); my @b = ( 43, 45 ); for my $i ( 0 .. $#b ) { for my $j ( 0 .. $#a ) { next unless defined $a[ $j ]; $a[ $j ] = undef, last if $a[ $j ] == $b[ $i ]; } } @a = grep { defined } @a; print join ', ', @a; print "\n";` [download] outputs: `42, 42, 43, 43, 44, 46` This obviously trivial modification is more conservative and assumes one can't modify @a: `my @a = ( 42, 42, 43, 43, 43, 44, 45, 46 ); my @b = ( 43, 45 ); my @c = @a; for my $i ( 0 .. $#b ) { for my $j ( 0 .. $#c ) { next unless defined $c[ $j ]; $c[ $j ] = undef, last if $c[ $j ] == $b[ $i ]; } } @c = grep { defined } @c; print join ', ', @c; print "\n";` [download] The second outputs the same as the first. Neither cares if the arrays are presorted, because it's O(m*n) and checking each against each already.	[reply] [d/l] [select]
Re^2: Difference arrays. by ikegami (Patriarch) on Sep 05, 2008 at 05:44 UTC
In-place sorting would be much faster [ O(N log N) instead of O(N²) ] and can be be written to use O(1) extra memory. However, it destroys the original arrays. Also, you're wasting memory by placing @c on the stack. You could drop your memory usage from O(N) to O(1) by compressing `@c` in-place. `grep` doesn't work in-place like `sort` when the source and destination is the same. Update: Added downside.	[reply] [d/l] [select]
Re^3: Difference arrays. by mr_mischief (Monsignor) on Sep 05, 2008 at 09:48 UTC
I really just posted it as an interesting alternative. The method of marking the array directly was the main focus. I already said it'd run more slowly than some others. It's actually not bad where the subset is 32 or so items or fewer, or if @a has lots of duplicates that happen to be in @b. It doesn't slow down from function calls in the tightly wound sections. The grep is the biggest memory concern, and that's an implementation detail of the language. The original post asked for `grep` and a hash. I offered an array instead of a hash. That should save some memory by itself. I could splice @c (or @a) in the foreach, but perlsyn specifically forbids that. I could pop off each element and push it back on only if it's defined. That seems like a lot of work in response to a request of a simple solution which could include `grep`, and I'm sure BrowserUK could figure that part out anyway. Mine's already not the easiest here to understand. If the memory use issue is due to thousands of small arrays quadrupling in size, then my solution could be useful. If the problem is that the actual production arrays are huge or that the subset arrays are fairly long, then it won't be. That solution can also pretty easily be altered so that by sorting only @b any duplicates within @b do not cause a loop through @a again. It's not a giant optimization, but it could slow the growth substantially if the typical data set has lots of duplicates in the subset array. I have no idea how prevalent duplicates within that array actually are. `my @a = ( 42, 42, 43, 43, 43, 44, 45, 46, 41, -13 ); my @b = ( 43, 45, -13, 43 ); my @c = @a; @b = sort @b; for my $i ( 0 .. $#b ) { next if $i > 0 && $b[ $i ] == $b[ $i - 1]; for my $j ( 0 .. $#c ) { next unless defined $c[ $j ]; $c[ $j ] = undef, last if $c[ $j ] == $b[ $i ]; } } @c = grep { defined } @c; print join ', ', @c; print "\n";` [download] BTW, why do you use a for loop to push the elements of @a onto @c? Why not `push @c, @a;` instead? Is that a memory optimization peculiar to how perl handles `push` with a list or array argument internally?	[reply] [d/l] [select]
Re^4: Difference arrays. by ikegami (Patriarch) on Sep 05, 2008 at 22:12 UTC
Re: Difference arrays. by Anonymous Monk on Sep 05, 2008 at 05:16 UTC
Algorithm::Diff	[reply]