keiusui has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of strings:

@s = ('abc', 'dog', 'cat', 'rabbit', 'attic');

and I want to determine if one of the elements of the list is the word 'at'.

'at' is not in the list, although 'cat' and 'attic' are.

In order to search the list, do I have to loop through every element in the list (using the foreach function) until I find a match?

Or, is there a more efficient way of searching a list?

Clearly in my list above, it isn't a hassle to loop through 5 elements in a list. But I'd like to know if there is a more efficient way of searching a list, especially if the list contains hundreds of elements.

Thank you so much!

Replies are listed 'Best First'.
Re: searching a list
by davido (Cardinal) on Jun 27, 2006 at 04:34 UTC

    There is no way to check every element in a list without looking at every element in a list. This is not a Perl problem, nor is it a problem unique to Computer Science; it's a real-world problem. Unless you look at every oyster, you won't know which ones have pearls within. Statistical analysis might gain you insight regarding the probability of finding a certain number of pearls given a certain quantity of oysters, but until you open a given oyster, you won't know if it actually contains one.

    There are shortcuts for look-ups, but in order for these shortcuts to exist, a sort of index, or some form of a hashing algorithm has to be generated or applied in the first place.

    The fastest average-case look-up of an element within an array will be O(n). Ok, O(n/2) would be average case if you quit after finding the first occurrence, but in Big-O notation you generally drop the constant-value multiplicative co-efficient (ie, O(2n) becomes O(n), just like O(0.5n) is usually regarded as O(n)). Creation of the array takes O(n) too.

    On the other hand, hash look-ups take O(1) time, but hash creation takes O(n log n). So a hash takes a little longer than a simple array to create, and doesn't maintain "order", but does provide faster look-ups.

    There are a lot of other strategies too, but they all boil down to either keeping track of something at creation time, or keeping things in order at creation time to facilitate searches later, or blindly searching an unrefined list.

    You might be able to hide the complexity of looping behind subroutines such as grep or first(), but a loop is still taking place behind the scenes.


    Dave

Re: searching a list
by bobf (Monsignor) on Jun 27, 2006 at 04:14 UTC

    Please see perlfaq4: "How can I tell whether a certain element is contained in a list or array?", and Getting Matching Items From An Array.

    If that isn't what you intended, please clarify the question. Thanks.

    Update: added a link to Limbic~Region's most excellent companion document to the FAQ.

      bobf,
      I wrote a compliment tutorial to the FAQ - Getting Matching Items From An Array. It is intended to help people decide what method is appropriate for them and how to adjust to meet their specific needs as not all solutions are one size fits all.

      Cheers - L~R

Re: searching a list
by davidrw (Prior) on Jun 27, 2006 at 04:15 UTC
    you can grep but that goes through the whole list:
    my $ct = scalar grep { $_ eq 'at' } @s;
    Or you can look your self and break when you get a hit:
    my $found; foreach ( @s ){ next unless $_ eq 'at'; $found = 1; last; }
    I would also suspect that List::Util has something useful..

    Yet another option, if you're ok w/more or less doubling the memory and going through the whole list, is to hash it up (maybe useful elsewhere in your code so it's worth it?):
    my %h; $h{$_}++ for @s; warn "got " . ($h{at}||0) . " hits for 'at'";

      I would also suspect that List::Util has something useful

      yeah, first.

      use List::Util qw( first ); my $found = defined first { $_ eq 'at' } @s;

      By the way,
      my $ct = scalar ...
      is the same as
      my $ct = ...

        By the way, my $ct = scalar ... is the same as my $ct = ...

        Yeah, true... but I think some people (myself included) adopt the practice (in these sorts of situations) of using scalar anyway, just so as to remove any doubt about the intention. Personally, I think it's not a bad practice, and I'm not aware of any performance hit - or is there? :)

Re: searching a list
by Zaxo (Archbishop) on Jun 27, 2006 at 04:17 UTC

    A hash provides the best way I know to search,

    my %hash; @hash{'abc', 'dog', 'cat', 'rabbit', 'attic'} = (); for (qw/cat at play with rabbit/) { print $_, " is ", exists($hash{$_})? '', 'not ', 'in the list', $/; }

    After Compline,
    Zaxo

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: searching a list
by esskar (Deacon) on Jun 27, 2006 at 13:05 UTC
    searching a list, i always like binary search. of course, the list has to be sorted tough. here is a little script
    use strict; use warnings; my @s = ( 'abc', 'dog', 'cat', 'rabbit', 'attic' ); my @sorted_copy = sort { $a cmp $b } @s; print ' at:' . contains_string( 'at', \@sorted_copy ) . "\n"; print 'cat:' . contains_string( 'cat', \@sorted_copy ) . "\n"; sub contains_string { my ( $var, $array ) = @_; return binary_string_search( $var, $array ) > -1; } sub binary_string_search { my ( $var, $array, $lo, $hi ) = @_; $lo = 0 unless defined $lo; $hi = $#{ @{$array} } unless defined $hi; while ( $lo <= $hi ) { # determine new division point my $mid = $lo + int( ( $hi - $lo ) / 2 ); if ( $array->[$mid] eq $var ) { return $mid; } elsif ( $var le $array->[$mid] ) { $hi = $mid - 1; } else { $lo = $mid + 1; } } # nothing found return -1; }
    i think you can achieve a speedy solution when searching a big list for more than one item.

    UPDATE: minor code changes.