dougbot has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks. I am a biologist trying to add Perl scripting to my skill set. I've been working through the "Unix and Perl Primer for Biologists", which has been a lot of fun http://korflab.ucdavis.edu/unix_and_Perl/.

I've been having so much fun with it that I have been screwing around with non biological problems, including writing a multi word anagram solver. I realize that many people have done this before, but struggling through making my own has taught me quite a bit that I didn't glean from the course.

Anyway, I've run in to a bit of snag when trying to subtract a found anagram from the larger letter list from which it came. I'd like to be able to take the leftover letters and search for more anagrams within them.

I did a lot of searching, and was initially trying to do this by converting the found anagram to a hash, and then asking if each character in the letter list existed in the hash. The problem with this approach is that if a particular word has multiple occurrences of the same letter (like "butt"), only one of the letters can be loaded in to the hash because the keys have to be unique ("butt" becomes "but").

I've since come up with this, which sort of works, as long as I don't use Warnings (I realize this is bad!):

#!/usr/bin/perl use strict; my @letters = qw(a a a b b b c c c); my @word = qw(a a b b); my $length = @letters; my $wlength = @word; for (my $i = 0; $i < $length; $i++) { for (my $j = 0; $j < $wlength; $j++) { if (($letters[$i]) eq ($word[$j])) { splice (@letters, $i, 1); splice (@word, $j, 1); } } } print "@letters\n";

This works as long as the "word" contains fewer copies of a given character than the "letters", but will not remove the final copy of a character if there are the same number of occurrences in both. For example, using the above code to subtract "a a b b" from "a a a b b b c c c" gives "a b c c c", however, when I subtract "a a a b b" from the same set of letters, I still get "a b c c c" instead of the desired "b c c c".

I am thinking that there is a better way to approach this problem, but I can't for the life of me find an easier way to do it. Thanks in advance for any guidance you can provide!!

Replies are listed 'Best First'.
Re: Subtracting one array from another (both have duplicate entries)
by tangent (Parson) on Dec 04, 2013 at 01:49 UTC
    You can use a hash to keep track of each letter by setting the value of each key to the number of times it appears in the word and then decrementing that value each time that letter is removed from @letters. Code below creates a new array to hold the remaining letters - probably not the most efficient way but may help:
    use strict; use warnings; my @letters = qw(a a a b b b c c c); my @word = qw(a a a b b); # create the hash my %count; for my $letter (@word) { $count{$letter}++ } my @remain; for my $letter (@letters) { if (not $count{$letter}) { push(@remain,$letter); } else { $count{$letter}--; } } print "@remain\n"; # prints: b c c c
      Awesome! I guess that should have been an obvious way to use a hash for me, but as you can probably tell, I am very much a novice. It is freaking hard learning to write code at this point in my life, but I am pretty sure it good for my brain :-) Thanks again!
Re: Subtracting one array from another (both have duplicate entries)
by BrowserUk (Patriarch) on Dec 04, 2013 at 02:51 UTC

    @letters = qw(a a a b b b c c c);; @word = qw(a a a b b);; ++$h{$_} for @letters;; --$h{$_} for @word;; print map{ ($_) x $h{$_} } keys %h;; c c c b

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Subtracting one array from another (both have duplicate entries)
by karlgoethebier (Abbot) on Dec 04, 2013 at 15:12 UTC
    use Data::Dumper; use Data::Difference qw(data_diff); my @letters = qw( a a a b b b c c c ); my @word = qw( a a a b b); my @diff = data_diff( \@letters, \@word ); print Dumper( \@diff ); for (@diff) { print $_->{a} } __END__ $VAR1 = [ { 'a' => 'b', 'path' => [ 5 ] }, { 'a' => 'c', 'path' => [ 6 ] }, { 'a' => 'c', 'path' => [ 7 ] }, { 'a' => 'c', 'path' => [ 8 ] } ]; bccc

    Update: "If you don't know the algorithm quickly, it's better to check out a module." (Karl Goethebier, PerlMustard)

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: Subtracting one array from another (both have duplicate entries)
by GotToBTru (Prior) on Dec 04, 2013 at 21:47 UTC
    What you really want to work with here is a set instead of an array. Back in algorithms class we implemented a full set of operations for sets using arrays (in Pascal, no less!). I bet that would be easier using hashes. Or, there's Set::Scalar, Set::Formula, etc. for those averse to reinventing the wheel.
Re: Subtracting one array from another (both have duplicate entries)
by QM (Parson) on Dec 10, 2013 at 13:18 UTC
    There's a lot of good stuff on Perl Monks. You reminded me of some of my previous endeavors (and a script I lost when I moved $work). For example, this looks at using strings instead of arrays, and creating a regex to match against a dictionary (hashed lexicographically). The entire thread might be useful to you.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of