in reply to Beauty is in the eye of the beholder

Yes, grep() is a bad idea here. And your regex, sadly, is an even poorer one. The FAQ warns you about this, believe it or not, in the same answer about how to determine if a list contains an element.

I believe the List::Utils module offers a function called first() which looks like this:
sub first (&@) { my $cref = shift; $cref->($_) and return $_ for @_; }
So that it returns the first value in a list for which a chunk of code returns true:
# get the first number found over 1000 $first_match = first { $_ > 1000 } @numbers;
But if you do this a lot of times, you're kissing inefficiency on its ratty lips. Use a hash:
my %seen; @seen{@array} = ();
Now you can use the exists() test:
if (exists $seen{blah}) { ... }
Again, read the FAQ. Look for the keyword "contains".

japhy -- Perl and Regex Hacker

Replies are listed 'Best First'.
Re: Re: Beauty is in the eye of the beholder
by McD (Chaplain) on Feb 21, 2001 at 02:31 UTC
    One thousand pardons, m'lord japhy, but you wrote:

    But if you do this a lot of times, you're kissing inefficiency on its ratty lips. Use a hash:

    In the oft-maligned name of pragmatism, I cobbled together the following brief, unscientific test:

    #!/usr/bin/perl -w use Benchmark; @List = ("aaaa" .. "zzzz"); print "Elements:", scalar @List, "\n"; sub first (&@) { my $cref = shift; $cref->($_) and return $_ for @_; } $t0 = new Benchmark; $first_match = first {$_ eq "zzzz"} @List; $t1 = new Benchmark; $td = timediff($t1, $t0); print "Found it! (sub)\n" if $first_match; print "sub took: ", timestr($td), "\n"; # Crufty way to figure out how big I am on a Linux box - /msg me with +improvements, please! -McD $size = (split(" ", `ps -hlp $$`))[6]; print "Size: $size\n"; $t0 = new Benchmark; my $Found=0; for (@List) { if ($_ eq "zzzz") { $Found=1; last; } } $t1 = new Benchmark; $td = timediff($t1, $t0); print "Found it! (scan)\n" if $Found; print "scan took: ", timestr($td), "\n"; $size = (split(" ", `ps -hlp $$`))[6]; print "Size: $size\n"; $t0 = new Benchmark; my %seen; @seen{@List} = (); $t1 = new Benchmark; $td = timediff($t1, $t0); if (exists $seen{"zzzz"}) { print "Found it! (hash)\n"; } print "hash took: ", timestr($td), "\n"; $size = (split(" ", `ps -hlp $$`))[6]; print "Size: $size\n";
    To make this a worst-case for the linear teams, I searched for the last item in the list.

    As it turns out, the least appealing code is the fastest - but while the hash approach may not be kissing performance inefficiency on it's ratty lips, it's certainly been caught in some kind of carnal embrace with memory inefficiency.

    Here are the results on my box:

    ./existance.pl
    Elements:456976
    Found it! (sub)
    sub took:  2 wallclock secs ( 1.49 usr +  0.01 sys =  1.50 CPU)
    Size: 58632
    Found it! (scan)
    scan took:  1 wallclock secs ( 0.40 usr +  0.00 sys =  0.40 CPU)
    Size: 58632
    Found it! (hash)
    hash took:  2 wallclock secs ( 0.90 usr +  0.15 sys =  1.05 CPU)
    Size: 88324
    
    Of course, we've strayed far from meditation and into experimentation. I'm sorry, what were we optimizing for again? :-)

    Peace,
    -McD

      You quoted me, and then didn't follow my lead. "But if you do this a lot of times, you're kissing inefficiency on its ratty lips."

      You performed these tests ONCE each. Use the first() function several times. Scan through the array several times. Create the hash ONCE, and use exists() many times.

      japhy -- Perl and Regex Hacker
        Nuts. You're absoloutly right.

        I misunderstood what you meant at first - now I see. This is a perfect example of the classic tradeoff of speed vs. memory.

        Which brings us back to meditation, after all, doesn't it?

        Time for a beer, methinks. All this tinkering and meditating is thirsty work. Thanks for following up!

        Peace,
        -McD

      Here's my test. first() can be made faster by passing an array reference, not the array.
      (RESULTS) Elements:17576 sub took: 52 (51.14 usr + 0.00 sys = 51.14 CPU) scan took: 17 (13.85 usr + 0.00 sys = 13.85 CPU) hash took: 0 ( 0.17 usr + 0.13 sys = 0.30 CPU) (CODE) #!/usr/bin/perl -w use Benchmark; @List = ("aaa" .. "zzz"); print "Elements:", scalar @List, "\n"; @rand_list = map $List[rand @List], 1 .. 100; sub first (&@) { my $cref = shift; $cref->($_) and return $_ for @_; } $t0 = new Benchmark; for (@rand_list) { $rand_element = $_; $first_match = first {$_ eq $rand_element } @List; } $t1 = new Benchmark; $td = timediff($t1, $t0); print "sub took: ", timestr($td), "\n"; $t0 = new Benchmark; for (@rand_list) { $rand_element = $_; for (@List) { last if $_ eq $rand_element } } $t1 = new Benchmark; $td = timediff($t1, $t0); print "scan took: ", timestr($td), "\n"; $t0 = new Benchmark; my %seen; @seen{@List} = (); $t1 = new Benchmark; $td = timediff($t1, $t0); for (@rand_list) { 1 if exists $seen{$_} } print "hash took: ", timestr($td), "\n";


      japhy -- Perl and Regex Hacker
Re: Re: Beauty is in the eye of the beholder
by ChOas (Curate) on Feb 20, 2001 at 18:19 UTC
    ;)))

    As you might have read, I would have used a hash....

    I was trying to find an example of beauty/efficiency
    sad enough, that was the best I could come up with ;))

    I would have probably used 'eq' instead of the regex too ;)))

    GreetZ!,
      ChOas

    print "profeth still\n" if /bird|devil/;
      Well, an interesting feature of Perl is that by using its data structures, one can often make a structure representing some complex algorithm (like a hash being used to eliminate duplicates).

      japhy -- Perl and Regex Hacker