Fideist11 has asked for the wisdom of the Perl Monks concerning the following question:

Say we pass a function two variables, both are strings. We want to return true or false depending on whether we can form the string in the second argument from the letters in the string in the first argument. Here's my working version. My question is what is the best way to do this with efficiency being of utmost importance? (since i do this about a million times). Is there a faster regex that could do this better than my logic algorithm? or can someone think of a better logical solution altogether? thanks a lot for your help.
#This bool function returns whether the second argument #string can be built with the first arguments letters sub match($$$) { my ($self,$letters,$word) = @_; $letters = lc($letters); $word = lc($word); #Now we build our hashes my (%letter_hash, %word_hash); while($letters) { $letter_hash{chop($letters)}++; } while($word) { $word_hash{chop($word)}++; } #Now to analyze my ($key, $value); while(($key,$value) = each(%word_hash)) { $letter_hash{$key} = $letter_hash{$key} - $value; return 0 if ($letter_hash{$key} < 0); } #Must be good return 1; }

Replies are listed 'Best First'.
Re: Finding One String in Another
by Abigail-II (Bishop) on Jul 31, 2002 at 16:51 UTC
    What the fastest algorithm will be depends on a lot of things, for instance whether you will have more failures than successes. Size of the words are important too.

    Here's one algorithm that benefits from "early failures", as soon as a letter in the second word is processed for which there's no corresponding letter in the first word, it returns. It's also using an array to count, not a hash.

    sub match { my @h; ++ $h [ord] for split // => shift; -- $h [ord] < 0 and return for split // => shift; 1; }
    It's a small (and untested) subroutine. The benefit of using an array, and splitting on an empty regex is that it can be ported to C trivially. If speed is really important, I'd consider using Inline::C.

    Note also that you could archieve major speedup by not calling the match function a million times - instead call it with a list of pairs of arguments to process. Calling a function is not a cheap task. However, you need to be careful and not kill the gained performance by using too much memory.

    Abigail

Re: Finding One String in Another
by particle (Vicar) on Jul 31, 2002 at 16:38 UTC
    sub match { my( $letters, $word ) = map lc, @_; return 0 if length $letters < length $word; my %letters_in_word; $letters_in_word{$_}++ for split //, $word; $letters_in_word{$_}-- for split //, $letters; for( values %letters_in_word) { return 0 if $_ > 0 } return 1 } print match('ABCDE', 'cab'); #prints 1 print match('abcde', 'cat'); #prints 0
    did i get your logic correct? i'm not sure how you're handling multiple letters... in my code they must be multiple in each word

    and why are you using prototypes?

    ~Particle *accelerates*

Re: Finding One String in Another
by marvell (Pilgrim) on Jul 31, 2002 at 16:42 UTC
    A million times in how long?

    I like the logic, but would be inclined to slim the code a tad:

    Updated to include length check.

    sub match($$$) { my ($self,$letters,$word) = @_; return 0 if length $letters < length $word; $lcount{$_}++ for lc($letters) =~ /./g; for (lc($word) =~ /./g) { return 0 unless $lcount{$_}--; } return 1; }

    --
    Steve Marvell

Re: Finding One String in Another
by RMGir (Prior) on Jul 31, 2002 at 16:45 UTC
    I don't know if this is a big win, but you can do
    #Now we build our hashes foreach(split "",$letters) { $letter_hash{$_}++; } foreach(split "",$word) { $word_hash{$_}++; } while(my($key, $value)=each(%word_hash)) { # no need to update $letter_hash{$key} return 0 if $letter_hash{$key}<$value; } return 1;
    This will work even if letters or words ends in a zero character, which would end your loop.
    --
    Mike
Re: Finding One String in Another
by fglock (Vicar) on Jul 31, 2002 at 16:46 UTC

    Try this:

     perl -e ' $a="abc"; $b="aCd"; @m = $a=~ /([^$b])/i; print "@m\n"'

    Array @m will be empty whenever $a can be built from the letters in $b !

    update: This regex says that "aa" can be built from letters in "a". But it can be used as a filter, before applying a more complex algorithm.

    This is an example on how to use it (I don't claim it is perfect :)

    use strict; my ($a,$b) = (shift, shift); my @m = $a =~ /([^$b])/ig; if ($#m < 0) { print "ok, can build '$a' using letters in '$b'\n"; foreach my $letter (split // => $a) { unless ($b =~ s/$letter//) { print "but will need an extra '$letter'\n" }; } } else { print "can't build '$a' because I don't have '",join("' and '", @m +),"'\n"; }

    update: use of /ig may make it slower, but it gives an estimate on how many characteres are missing

      I think not!
      $ perl -e ' $a="aa"; $b="a"; @m = $a=~ /([^$b])/i; print "@m\n"' $
      But you can't turn one a into two.

      Abigail

Re: Finding One String in Another
by Fideist11 (Sexton) on Jul 31, 2002 at 16:30 UTC
    oh yes i already noticed putting
    return 0 if (length($word) > length($letters));
    at the top would save me time in many instances. Since there's no way we could make $word out of $letters if $word is longer to start with...
Re: Finding One String in Another
by marvell (Pilgrim) on Jul 31, 2002 at 16:52 UTC
    I say give it five minutes and then benchmark the proposed solutions.

    Out of interest, how long are these strings, in general?

    --
    Steve Marvell