in reply to Better way to search array for elements in other array?

echo5,
It may be advantageous to look at Regexp::Assemble or Regexp::Trie. Note that, thanks to demerphq, in perl 5.10 and beyond the regexp alternation optimization of a trie is built in.

In other words, take the smaller of the two lists and turn it into a single regular expression. This turns your algorithm from O(N * M) to O(N). If this still doesn't make sense, please let me know and I will provide a working example that you can tweak to see if it fits your needs.

Update: The code snippet if (index(lc($data),lc($string)) ge 0) { should likely just be if (index($lc_data, $lc_string) > -1) {. First, you should use > instead of ge when dealing with numerical values (see perlop). Second, repeatedly lowercasing the values is wasted time. Do it once and then compare the copy.

Cheers - L~R

Replies are listed 'Best First'.
Re^2: Better way to search array for elements in other array?
by tilly (Archbishop) on Jan 24, 2011 at 20:10 UTC
    This turns your algorithm from O(N * M) to O(N).

    Not exactly. Suppose that L is the length of the strings you are trying to match. With the naive approach the RE engine can take advantage of Boyer-Moore, so its running time should be about O(N * M / L). With the trie approach the number of times that the re engine will start trying to match is O(N), however each attempted match that fails can cost up to O(L), so your running time is potentially as high as O(N * L). (Usually you won't hit this worst case though.)

    If you have a lot of strings to match, this is potentially a big win. If you have a small number of long strings to match, this is probably not a good optimization.

      tilly,
      I was very careful through out my post to say things like "may" and "to see if it fits". I should have said of the algorithm analysis....if you don't look too closely, this turns your algorithm.... I appreciate the clarification. I have other ideas about how this could be optimized for run time but the OP didn't provide enough real sample data.

      Cheers - L~R

Re^2: Better way to search array for elements in other array?
by ikegami (Patriarch) on Jan 24, 2011 at 20:37 UTC

    First, you should use > instead of ge when dealing with numerical values

    Well, ">=" would be the numerical equivalent to the string comparison operator "ge", so index(...) >= 0 would be perfectly fine.

    The reason the string comparison operators are not suitable for numerical comparisons is that they don't use different rules than the numerical comparison operators.

    $ perl -E'say 2 >= 10 ? 1 :0' 0 $ perl -E'say 2 ge 10 ? 1 :0' 1 # Because 2 ge 1
      ikegami,
      Well, ">=" would be the numerical equivalent

      You took that out of context. I also changed the 0 to a -1 which made > appropriate. Regarding the explanation for string operators versus numerical operators, I thought a pointer to perlop sufficient.

      Cheers - L~R

        I was telling the OP that he didn't need to do such a change, so it was hardly out of context.