Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

RFC: Text::Grap

by kwaping (Priest)
on Jul 12, 2006 at 21:24 UTC ( [id://560827]=perlmeditation: print w/replies, xml ) Need Help??

I was thinking about how grep is so easy to use, yet must iterate through every item of a list, becoming a problem if you have a large list. I came up with this simple module as a counterpart if you just want to know if an item is a list member or not.

This is version 0.01 so I'm open to suggestions!

My goal is to create a function that has as similar of an iterface as possible to the grep builtin, but can be used more effectively in a logical yes/no context.

I'm aware that my ($item) = grep(/$pat/,@list); stops grep upon the first successful match. However, I feel that this code is a little awkward to use in the aforementioned logical context.
package Text::Grap; require Exporter; @ISA = qw(Exporter); @EXPORT_OK = qw( grap ); sub grap (&@) { my ($sub,@list) = @_; foreach (@list) { return 1 if ($sub->($_)); } } 1;
And to use it...
#!/usr/bin/perl use strict; use warnings; use Text::Grap qw(grap); my @questions = qw(who what when where why); print grap(sub {/he/}, @questions) ? 'yes' : 'no';


Update 1: Code changed to reflect betterworld's suggestions.

---
It's all fine and dandy until someone has to look at the code.

Replies are listed 'Best First'.
Re: RFC: Text::Grap
by betterworld (Curate) on Jul 12, 2006 at 21:38 UTC

    grap returns the element that matched, which may be 0 or some other false value. So you can't use this module to find out if 0 is a member of some array. (This is no longer true since you've edited your post.)

    You could use the prototype (&@) to make your grap easier to call.

    By the way, your subroutine is not different from first() in List::Util. (This is no longer true, either.)

Re: RFC: Text::Grap
by adrianh (Chancellor) on Jul 13, 2006 at 07:06 UTC
Re: RFC: Text::Grap
by duff (Parson) on Jul 13, 2006 at 03:23 UTC
    I'm aware that my ($item) = grep(/$pat/,@list); stops grep upon the first successful match.

    Actually, I don't think that's true (unless it's some optimization that's been applied to recent perls). What it does is generate the entire list and then throw away all of the items except for the first one.

    But, since I haven't kept up with all of the optimizations that have been applied to perl over time, I don't know if that's still the case.

    Anyway, what it looks like you've done is re-invent a specific application of List::Util::first.

Re: RFC: Text::Grap
by kwaping (Priest) on Jul 13, 2006 at 15:14 UTC
    Thank you, everyone, for your insightful replies. I have decided to scrap this project and use either scalar grep or List::Utils instead. I'm embarrassed that I didn't do enough research beforehand, but at least I was smart enough to post this RFC. :) Score one more for the monastery!

    ---
    It's all fine and dandy until someone has to look at the code.
Re: RFC: Text::Grap
by VSarkiss (Monsignor) on Jul 13, 2006 at 14:19 UTC
Re: RFC: Text::Grap
by Ieronim (Friar) on Jul 13, 2006 at 12:31 UTC
    In simple cases there is NO performance gain in using your function or even XS-based first() of List::Util :)
    grep in scalar context is much faster :)

    UPD: Tanktalus found a bug in my benchmark, so i corrected it :) The first function from List::Util beats the grep-based solution if the "good" element appears in the first middle of the array.

    Have a look at the benchmark:

    And at its result (the uppercase words denote the position of the searched element in the array):
    START: Rate grap grep first grap 638/s -- -14% -99% grep 746/s 17% -- -99% first 57971/s 8983% 7672% -- MIDDLE: Rate grap first grep grap 175/s -- -76% -77% first 735/s 319% -- -3% grep 758/s 332% 3% -- END: Rate grap first grep grap 102/s -- -73% -86% first 377/s 269% -- -50% grep 751/s 635% 99% --

      I'm not really convinced you're testing what you think you're testing. That's because you're attempting to close on the lexical start/middle/end variables, but the string is being eval'd way outside of that scope. That's why, when possible, I prefer to see anonymous code refs instead. So I tried converting your code to that - and, while I was at it, I made a minor optimisation to grap where I didn't copy all the elements to a lexical variable first ("grap2"). And then another minor optimisation to stop copying anything ("grap3").

      And the results are much different from yours:
      START: Rate grap grep grap2 grap3 first grap 3397/s -- -37% -93% -93% -99% grep 5381/s 58% -- -89% -89% -99% grap2 47287/s 1292% 779% -- -3% -92% grap3 48596/s 1330% 803% 3% -- -92% first 580886/s 16998% 10695% 1128% 1095% -- MIDDLE: Rate grap grap3 grap2 first grep grap 1152/s -- -22% -23% -75% -79% grap3 1486/s 29% -- -0% -67% -72% grap2 1493/s 30% 0% -- -67% -72% first 4567/s 297% 207% 206% -- -15% grep 5381/s 367% 262% 260% 18% -- END: Rate grap grap3 grap2 first grep grap 756/s -- -15% -16% -69% -86% grap3 891/s 18% -- -1% -63% -84% grap2 898/s 19% 1% -- -63% -84% first 2400/s 218% 170% 167% -- -56% grep 5493/s 627% 517% 512% 129% --
      Which tells me that the second optimisation I made (grap3 vs grap2) isn't much of one ;-)

      However, these are much lower numbers for rates which tells me that they are looping each time. I have a really fast machine already - I don't see how you could loop through 10,000 elements via grep, and do that 2.3 million times per second - that's 23 billion function calls per second, way past believability in any language on a single CPU.

        I just realised that we both did not include in the benchmark the solution that beats first() in all cases—an inlined foreach loop :)

        Benchmark code:

        Results on my machine:

        START: Rate grap grep grap2 grap3 first foreach grap 605/s -- -19% -93% -93% -99% -100% grep 745/s 23% -- -91% -91% -99% -100% grap2 8416/s 1291% 1030% -- -2% -84% -99% grap3 8566/s 1316% 1050% 2% -- -84% -99% first 53924/s 8811% 7140% 541% 530% -- -95% foreach 1005403/s 166050% 134888% 11846% 11637% 1764% -- MIDDLE: Rate grap grap2 grap3 first grep foreach grap 169/s -- -31% -32% -77% -78% -84% grap2 244/s 45% -- -2% -67% -68% -77% grap3 249/s 47% 2% -- -67% -68% -76% first 746/s 342% 205% 200% -- -3% -29% grep 772/s 358% 216% 210% 3% -- -27% foreach 1056/s 526% 332% 325% 42% 37% -- END: Rate grap grap3 grap2 first foreach grep grap 99.3/s -- -18% -20% -73% -81% -86% grap3 121/s 22% -- -3% -68% -77% -84% grap2 124/s 25% 3% -- -67% -76% -83% first 372/s 275% 208% 199% -- -29% -49% foreach 526/s 430% 335% 323% 41% -- -28% grep 735/s 640% 508% 491% 98% 40% --
        Thank you :) I was surprised by the results of the benchmark as they were countrary to the common sense :), but could not figure out the bug :) All i needed was to replace my with our :) I'll update my benchmark.
Re: RFC: Text::Grap
by radiantmatrix (Parson) on Jul 20, 2006 at 19:30 UTC

    If you test a particular array for the existance of many values, it might be faster/easier to understand to use the Schwartzian Transform. For example:

    my $matches = 0; foreach my $item (@items) { $matches++ if grep( $item eq $_, @list ); } print "All match!" if $matches == @items;

    becomes:

    my %st = map { $_ => undef } @list; my $matches = 0; foreach my $item (@items) { $matches++ if exists $st{$item}; } print "All match!" if $matches == @items;

    The larger @items gets (relative to @list), the better the latter performs relative to the former. I don't know where the 0-point lies -- that is, I don't know at what point it becomes faster to use the latter form -- so use of Benchmark and Devel::DProf (and friends) would be essential.

    It would be interesting (though decidedly niche) to see a module where the %st hash would be automatically maintained every time a particular array was updated. I'd probably use overload to create an interface like:

    use List::AutoST; my $list = List::AutoST->new(); # these appear to behave like normal push @list, $some, $values, $here; $list[1] = $value2; # but then... print "Ok!" if $list->has('this value'); # or (also via overload) print "Ok!" if exists $list{'this value'};

    I don't know about performance differences for that, but it would make an interesting and eas(y|ier) approach.

    <radiant.matrix>
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://560827]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-28 13:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found