Re: Beauty is in the eye of the beholder

Yes, grep() is a bad idea here. And your regex, sadly, is an even poorer one. The FAQ warns you about this, believe it or not, in the same answer about how to determine if a list contains an element.

I believe the List::Utils module offers a function called first() which looks like this:

sub first (&@) {
  my $cref = shift;
  $cref->($_) and return $_ for @_;
}
[download]

So that it returns the first value in a list for which a chunk of code returns true:

# get the first number found over 1000
$first_match = first { $_ > 1000 } @numbers;
[download]

But if you do this a lot of times, you're kissing inefficiency on its ratty lips. Use a hash:

my %seen;
@seen{@array} = ();
[download]

Now you can use the exists() test:

if (exists $seen{blah}) { ... }
[download]

Again, read the FAQ. Look for the keyword "contains".

japhy -- Perl and Regex Hacker

Comment on Re: Beauty is in the eye of the beholder Select or Download Code

Replies are listed 'Best First'.
Re: Re: Beauty is in the eye of the beholder by McD (Chaplain) on Feb 21, 2001 at 02:31 UTC
One thousand pardons, m'lord japhy, but you wrote: But if you do this a lot of times, you're kissing inefficiency on its ratty lips. Use a hash: In the oft-maligned name of pragmatism, I cobbled together the following brief, unscientific test: #!/usr/bin/perl -w use Benchmark; @List = ("aaaa" .. "zzzz"); print "Elements:", scalar @List, "\n"; sub first (&@) { my $cref = shift; $cref->($_) and return $_ for @_; } $t0 = new Benchmark; $first_match = first {$_ eq "zzzz"} @List; $t1 = new Benchmark; $td = timediff($t1, $t0); print "Found it! (sub)\n" if $first_match; print "sub took: ", timestr($td), "\n"; # Crufty way to figure out how big I am on a Linux box - /msg me with +improvements, please! -McD $size = (split(" ", `ps -hlp $$`))[6]; print "Size: $size\n"; $t0 = new Benchmark; my $Found=0; for (@List) { if ($_ eq "zzzz") { $Found=1; last; } } $t1 = new Benchmark; $td = timediff($t1, $t0); print "Found it! (scan)\n" if $Found; print "scan took: ", timestr($td), "\n"; $size = (split(" ", `ps -hlp $$`))[6]; print "Size: $size\n"; $t0 = new Benchmark; my %seen; @seen{@List} = (); $t1 = new Benchmark; $td = timediff($t1, $t0); if (exists $seen{"zzzz"}) { print "Found it! (hash)\n"; } print "hash took: ", timestr($td), "\n"; $size = (split(" ", `ps -hlp $$`))[6]; print "Size: $size\n"; [download] To make this a worst-case for the linear teams, I searched for the last item in the list. As it turns out, the least appealing code is the fastest - but while the hash approach may not be kissing performance inefficiency on it's ratty lips, it's certainly been caught in some kind of carnal embrace with memory inefficiency. Here are the results on my box: ./existance.pl Elements:456976 Found it! (sub) sub took: 2 wallclock secs ( 1.49 usr + 0.01 sys = 1.50 CPU) Size: 58632 Found it! (scan) scan took: 1 wallclock secs ( 0.40 usr + 0.00 sys = 0.40 CPU) Size: 58632 Found it! (hash) hash took: 2 wallclock secs ( 0.90 usr + 0.15 sys = 1.05 CPU) Size: 88324 Of course, we've strayed far from meditation and into experimentation. I'm sorry, what were we optimizing for again? :-) Peace, -McD	[reply] [d/l]
Re: Re: Re: Beauty is in the eye of the beholder by japhy (Canon) on Feb 21, 2001 at 02:47 UTC
You quoted me, and then didn't follow my lead. "But if you do this a lot of times, you're kissing inefficiency on its ratty lips." You performed these tests ONCE each. Use the `first()` function several times. Scan through the array several times. Create the hash ONCE, and use `exists()` many times. `japhy` -- Perl and Regex Hacker	[reply]
Re: Re: Re: Re: Beauty is in the eye of the beholder by McD (Chaplain) on Feb 21, 2001 at 05:58 UTC
Nuts. You're absoloutly right. I misunderstood what you meant at first - now I see. This is a perfect example of the classic tradeoff of speed vs. memory. Which brings us back to meditation, after all, doesn't it? Time for a beer, methinks. All this tinkering and meditating is thirsty work. Thanks for following up! Peace, -McD	[reply]
Re: Re: Re: Beauty is in the eye of the beholder by japhy (Canon) on Feb 21, 2001 at 03:02 UTC
Here's my test. `first()` can be made faster by passing an array reference, not the array. (RESULTS) Elements:17576 sub took: 52 (51.14 usr + 0.00 sys = 51.14 CPU) scan took: 17 (13.85 usr + 0.00 sys = 13.85 CPU) hash took: 0 ( 0.17 usr + 0.13 sys = 0.30 CPU) (CODE) #!/usr/bin/perl -w use Benchmark; @List = ("aaa" .. "zzz"); print "Elements:", scalar @List, "\n"; @rand_list = map $List[rand @List], 1 .. 100; sub first (&@) { my $cref = shift; $cref->($_) and return $_ for @_; } $t0 = new Benchmark; for (@rand_list) { $rand_element = $_; $first_match = first {$_ eq $rand_element } @List; } $t1 = new Benchmark; $td = timediff($t1, $t0); print "sub took: ", timestr($td), "\n"; $t0 = new Benchmark; for (@rand_list) { $rand_element = $_; for (@List) { last if $_ eq $rand_element } } $t1 = new Benchmark; $td = timediff($t1, $t0); print "scan took: ", timestr($td), "\n"; $t0 = new Benchmark; my %seen; @seen{@List} = (); $t1 = new Benchmark; $td = timediff($t1, $t0); for (@rand_list) { 1 if exists $seen{$_} } print "hash took: ", timestr($td), "\n"; [download] `japhy` -- Perl and Regex Hacker	[reply] [d/l]
Re: Re: Beauty is in the eye of the beholder by ChOas (Curate) on Feb 20, 2001 at 18:19 UTC
;))) As you might have read, I would have used a hash.... I was trying to find an example of beauty/efficiency sad enough, that was the best I could come up with ;)) I would have probably used 'eq' instead of the regex too ;))) GreetZ!, ChOas print "profeth still\n" if /bird\|devil/;	[reply]
Re: Re: Re: Beauty is in the eye of the beholder by japhy (Canon) on Feb 20, 2001 at 18:22 UTC
Well, an interesting feature of Perl is that by using its data structures, one can often make a structure representing some complex algorithm (like a hash being used to eliminate duplicates). `japhy` -- Perl and Regex Hacker	[reply]