rooneyl has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I am hoping someone could give me some pointers. As a way of trying to learn Perl, I decided to write a word game for my son. I am having problems will how to deal with one aspect of it. I want to be able to compare the letters (and how often they occur) that make up the first word are present in the second. For example 'dog' compared with 'good' would return true, 'food' compared with 'fodder' would return false as there is only one 'o'. I am thinking that maybe I could convert each scalar into an array and then compare each element in turn, is this possible? Any help would be gratefully received. Thank you in advance

Replies are listed 'Best First'.
Re: Word Comparison
by pc88mxer (Vicar) on Mar 26, 2008 at 22:48 UTC
    There is a more-or-less conventional way to do it, and then there are those that take advantage of perl's regular expression capabilities. Here's one that uses the 's' operator:
    sub contains { # return true if $_[0] is contained in $_[1] my ($s, $t) = @_; for (split('', $s)) { return 0 unless ($t =~ s/$_//); } 1; } print contains("dog", "good"), "\n"; # -> 1 print contains("food", "fodder"), "\n"; # -> 0
    Update: The 'conventional' way would be to perform a count of each letter and compare counts:
    sub contains { my ($s, $t) = @_; my (%sc, %tc); for (split('', $s)) { $sc{$_}++ }; for (split('', $t)) { $tc($_}++ }; for my $k (keys %sc) { return 0 unless $sc{$k} <= $tc{$k}; } 1; }
    While not the most efficient in this situation, this code exemplifies common types of data processing that is done with perl, so it's valuable to know about.
Re: Word Comparison
by oko1 (Deacon) on Mar 26, 2008 at 23:09 UTC

    Interesting how this keeps coming up - it just so happens that I've been working on this class of problems recently.

    #!/usr/bin/perl -w use strict; my @words = ("dog good", "food fodder"); TOP: for (@words){ my ($first, $second) = split; for my $ltr (split //, $first){ unless ($second =~ s/$ltr//){ print "'$_' does not match.\n"; next TOP; } } print "'$_' matches.\n"; }

    Running it produces the following:

    'dog good' matches.
    'food fodder' does not match.
    

    It gets much more interesting when the words are made up of cards that can be either one or two letters long... :)

Re: Word Comparison
by mobiusinversion (Beadle) on Mar 26, 2008 at 23:32 UTC
    It looks like you want to establish word similarity without regard to order or case. Try a hash (it will give you the flexibility to modify the rules of the game later).
    my $word = 'bantha fodder'; my @letters = split//, lc($word); my %counts; for(@letters){ $counts{$_}++; } for(sort {$counts{$b} <=> $counts{$a}} keys %counts){ my $t = $_ =~ /[a-z]/ ? 'letter' : $_ =~ /[0-9]/ ? 'number' : 'character'; my $s = $counts{$_} > 1 ? 's' : ''; printf " the %-10s: %-3s ocurred %-3s time%s\n", $t,$_,$counts{$_},$s }
    which prints
    the letter : 'a' ocurred 2 times the letter : 'd' ocurred 2 times the letter : 'e' ocurred 1 time the letter : 'n' ocurred 1 time the letter : 'r' ocurred 1 time the character : ' ' ocurred 1 time the letter : 'h' ocurred 1 time the letter : 'b' ocurred 1 time the letter : 'f' ocurred 1 time the letter : 't' ocurred 1 time the letter : 'o' ocurred 1 time
    Now wrap that in a subroutine and use some loop control:
    sub letter_counts { my @letters = split//, lc(shift); my %counts; for(@letters){ $counts{$_}++; } return %counts } my $word1 = 'bantha fodder'; my $word2 = 'banana slug'; my %counts1 = letter_counts($word1); my %counts2 = letter_counts($word2); my $result = 'are equivalent'; test_loop: for(keys %counts1, keys %counts2){ if(!defined $counts2{$_} || !defined $counts1{$_}){ $result = 'are not equivalent'; last test_loop } } print "'$word1' and '$word2' $result\n";
    Now you can modify the loop / conditional as you see fit.

    For order dependent similarity, try Algorithm::Diff
Re: Word Comparison
by nefigah (Monk) on Mar 26, 2008 at 22:45 UTC

    Sounds like a fun task, and regular expressions will be your friend here... but from the example you listed, I'm not sure if the rules are totally clear to me? Could you come up with a list of criteria for "what makes a word close enough to another word"? Then we can give some more concrete ideas :)


    I'm a peripheral visionary... I can see into the future, but just way off to the side.

      I'll try an explain how the game works.
      You are presented with a word, for example 'perlmonksisgreat'.
      You have to try and make words from this using only the letters that are in the word, you can only use each letter as many times as it appears. In the above example you could use 1 x 'p' and 2 x 'e' and so on.
      Some examples of valid words are 'perl', 'slop', 'term'.
      An example of a non-valid word would be 'pepper' (only 1 'p').

        Ah, in that case, use pc88's first example below, it does exactly what you want :)


        I'm a peripheral visionary... I can see into the future, but just way off to the side.

Re: Word Comparison
by ysth (Canon) on Mar 27, 2008 at 03:55 UTC
    Maybe not super efficient in terms of time, but at least terse:
    for ([qw/dog good/],[qw/food fodder/]) { my ($first, $second) = @$_; print "$first does "; print "not " unless join("",sort split //,$second) =~ join(".*",sort split // +,$first); print "contain $second\n"; }
    I do this in word twist, which you may want to look at.
    join("",sort split //,$first) =~ ("^".join("?",sort(split //,$second)) +."?\\z")
    is possibly a faster way - I haven't benchmarked.
Re: Word Comparison
by hipowls (Curate) on Mar 27, 2008 at 05:01 UTC

    Now a regex variant;) I don't know if you'd want to use it but it does demonstrate building a regex at run time.

    while ( my $line = <DATA> ) { my ($new, $old) = split /\s+/, $line; my $rx = regex($new); print "$new can ", ( $old =~ /$rx/? '': 'NOT ' ), "be made from $o +ld\n"; } sub regex { my %letters; $letters{$_}++ for split //, lc shift; my $regex = join '', map { "(?=(?:.*${_}){$letters{$_}})" } keys % +letters; return qr/\A$regex/si; } __DATA__ dog good food fodder lot total fuse useful poor porridge root rotor __END__ dog can be made from good food can NOT be made from fodder lot can be made from total fuse can be made from useful poor can NOT be made from porridge root can be made from rotor
    I'll show the regular expression generated from 'root' in detail
    my $regex = qr{ \A # anchor at start for efficiency (?= # positive lookahead for (?: .* # anything r # with an 'r' after it ){1} # at least once ) (?= # positive lookahead for (?:.*t){1} # a 't' ) (?= # positive lookahead for (?:.*o){2} # for two 'o's ) }six; # ignore case
    All the lookaheads have to succeed for the regex to match.

    On a side note I benchmarked the variants (?:.*r), (?:[^r]*+r) and (?:[^r*]r) on different input strings and found the first was usually fastest.