Finding One String in Another

Fideist11 has asked for the wisdom of the Perl Monks concerning the following question:

Say we pass a function two variables, both are strings. We want to return true or false depending on whether we can form the string in the second argument from the letters in the string in the first argument. Here's my working version. My question is what is the best way to do this with efficiency being of utmost importance? (since i do this about a million times). Is there a faster regex that could do this better than my logic algorithm? or can someone think of a better logical solution altogether? thanks a lot for your help.

#This bool function returns whether the second argument
#string can be built with the first arguments letters
sub match($$$) {
    my ($self,$letters,$word) = @_;
    $letters = lc($letters);
    $word = lc($word);


    #Now we build our hashes
    my (%letter_hash, %word_hash);
    while($letters) {
        $letter_hash{chop($letters)}++;
    }
    while($word) {
        $word_hash{chop($word)}++;
    }


    #Now to analyze
    my ($key, $value);
    while(($key,$value) = each(%word_hash)) {
        $letter_hash{$key} = $letter_hash{$key} - $value;
        return 0 if ($letter_hash{$key} < 0);
    }


    #Must be good
    return 1;
}
[download]

Comment on Finding One String in Another Download Code

Replies are listed 'Best First'.
Re: Finding One String in Another by Abigail-II (Bishop) on Jul 31, 2002 at 16:51 UTC
What the fastest algorithm will be depends on a lot of things, for instance whether you will have more failures than successes. Size of the words are important too. Here's one algorithm that benefits from "early failures", as soon as a letter in the second word is processed for which there's no corresponding letter in the first word, it returns. It's also using an array to count, not a hash. `sub match { my @h; ++ $h [ord] for split // => shift; -- $h [ord] < 0 and return for split // => shift; 1; }` [download] It's a small (and untested) subroutine. The benefit of using an array, and splitting on an empty regex is that it can be ported to C trivially. If speed is really important, I'd consider using `Inline::C`. Note also that you could archieve major speedup by not calling the match function a million times - instead call it with a list of pairs of arguments to process. Calling a function is not a cheap task. However, you need to be careful and not kill the gained performance by using too much memory. Abigail	[reply] [d/l] [select]
Re: Finding One String in Another by particle (Vicar) on Jul 31, 2002 at 16:38 UTC
`sub match { my( $letters, $word ) = map lc, @_; return 0 if length $letters < length $word; my %letters_in_word; $letters_in_word{$_}++ for split //, $word; $letters_in_word{$_}-- for split //, $letters; for( values %letters_in_word) { return 0 if $_ > 0 } return 1 } print match('ABCDE', 'cab'); #prints 1 print match('abcde', 'cat'); #prints 0` [download] did i get your logic correct? i'm not sure how you're handling multiple letters... in my code they must be multiple in each word and why are you using prototypes? ~Particle accelerates	[reply] [d/l] [select]
Re: Finding One String in Another by marvell (Pilgrim) on Jul 31, 2002 at 16:42 UTC
A million times in how long? I like the logic, but would be inclined to slim the code a tad: Updated to include length check. `sub match($$$) { my ($self,$letters,$word) = @_; return 0 if length $letters < length $word; $lcount{$_}++ for lc($letters) =~ /./g; for (lc($word) =~ /./g) { return 0 unless $lcount{$_}--; } return 1; }` [download] -- Steve Marvell	[reply] [d/l]
Re: Finding One String in Another by RMGir (Prior) on Jul 31, 2002 at 16:45 UTC
I don't know if this is a big win, but you can do `#Now we build our hashes foreach(split "",$letters) { $letter_hash{$_}++; } foreach(split "",$word) { $word_hash{$_}++; } while(my($key, $value)=each(%word_hash)) { # no need to update $letter_hash{$key} return 0 if $letter_hash{$key}<$value; } return 1;` [download] This will work even if letters or words ends in a zero character, which would end your loop. -- Mike	[reply] [d/l]
Re: Finding One String in Another by Fideist11 (Sexton) on Jul 31, 2002 at 16:30 UTC
oh yes i already noticed putting `return 0 if (length($word) > length($letters));` [download] at the top would save me time in many instances. Since there's no way we could make $word out of $letters if $word is longer to start with...	[reply] [d/l]
Re: Finding One String in Another by fglock (Vicar) on Jul 31, 2002 at 16:46 UTC
Try this: `perl -e ' $a="abc"; $b="aCd"; @m = $a=~ /([^$b])/i; print "@m\n"'` Array @m will be empty whenever $a can be built from the letters in $b ! update: This regex says that "aa" can be built from letters in "a". But it can be used as a filter, before applying a more complex algorithm. This is an example on how to use it (I don't claim it is perfect :) `use strict; my ($a,$b) = (shift, shift); my @m = $a =~ /([^$b])/ig; if ($#m < 0) { print "ok, can build '$a' using letters in '$b'\n"; foreach my $letter (split // => $a) { unless ($b =~ s/$letter//) { print "but will need an extra '$letter'\n" }; } } else { print "can't build '$a' because I don't have '",join("' and '", @m +),"'\n"; }` [download] update: use of /ig may make it slower, but it gives an estimate on how many characteres are missing	[reply] [d/l] [select]
Re: Finding One String in Another by Abigail-II (Bishop) on Jul 31, 2002 at 16:59 UTC
I think not! `$ perl -e ' $a="aa"; $b="a"; @m = $a=~ /([^$b])/i; print "@m\n"' $` [download] But you can't turn one a into two. Abigail	[reply] [d/l]
Re: Finding One String in Another by marvell (Pilgrim) on Jul 31, 2002 at 16:52 UTC
I say give it five minutes and then benchmark the proposed solutions. Out of interest, how long are these strings, in general? -- Steve Marvell	[reply]