http://qs1969.pair.com?node_id=1089040

Wiggins has asked for the wisdom of the Perl Monks concerning the following question:

So, 5 years ago I was trying to solve (efficiently) 80 regexs run over a 10K document. That quest ended in success. Now I am wrestling with filtering out lines of an array that match any entry of an array of regexs I am giving 'grep' a shot but am having a problem generalizing a single test into multiple tests.

--Thanks---
Thanks for the quick response. Another nifty tool!!

Perldoc shows:

@foo = grep {!/^#/} @bar; # weed out comments
I want to replace that single regex with a loop over an array of 'qr's.
my @regs =( qr/split/ , qr/se.d/ , qr/open/, qr/print/, # might be another 100 in here # ); my @bar; my @foo; push @foo, "still empty"; open INP , "<../IPutils.pm"; #random code while (<INP>){ push @bar, $_; } print @bar, "\n--EOD-----------\n"; @foo = grep{ # !/^\s*#/ #sample from Perldocs #{ #causes compile error #( #causes compile error my $final =1; #default true foreach my $r (@regs){ $final = 0 if ( $r ); #match means drop this line } $final; #last value of the block hmmmmm #) #} }@bar; print @foo;
'return' shouldn't be the correct mechanism to tell grep true/false. But when I run this, the second print (@foo) produces nothing.

It is always better to have seen your target for yourself, rather than depend upon someone else's description.

Replies are listed 'Best First'.
Re: grep with looped tests
by LanX (Saint) on Jun 06, 2014 at 15:31 UTC
    I couldn't resist solving this with 2 nested grep! =)

    DB<100> @x=1..10 => (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) DB<101> @re=qw/ 3 7 1 / => (3, 7, 1) DB<102> grep { my $data=$_; grep { $data =~ /$_/ } @re } @x => (1, 3, 7, 10)

    Cheers Rolf

    (addicted to the Perl Programming Language)

      Similarly, using first from List::Util to short circuit the inner loop when you match a line :)

      use v5.18; use warnings; use List::Util 'first'; use Data::Dumper; my @data = 0..10; my @re = (3,7,9); my @res = grep {my $line = $_; !defined first {$line =~ $_} @re } @dat +a; say Dumper(\@res);
        Ah I always forget that List::Util (which is core) has now any , too!

        (I'm still hooked on List::MoreUtils , see next answer :)

        > my @res = grep {my $line = $_; !defined first {$line =~ $_} @re } @data;

        But I think you'd certainly prefer any over first to avoid !defined ! :)

        update

        though first was and any wasn't part of my 5.10 distribution!

        Cheers Rolf

        (addicted to the Perl Programming Language)

      This is a quite nice construct. I had a somewhat similar problem to solve a couple of weeks ago and did not think about such a clever and concise solution. Having said that, the explicit loop I used was probably not less efficient. But I wish I had thought about such a solution as yours.

      Rolf, if I may, a brief side question: which option of the debugger are you using for displaying directly last evaluated expression without having to print is explicitly? I mean, when you have this:

      DB<100> @x=1..10 => (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
      what option do you use to have the second line above printed seemingly automatically? Thanks for your response, and sorry for being off-topic.

        Unfortunately there is no option, I patched the code. (after a hint here from pemungkah )

        Actually monkey patched some lines from a config file.

        I can share later if you want, I'm mobile ATM.

        Cheers Rolf

        (addicted to the Perl Programming Language)

Re: grep with looped tests
by betterworld (Curate) on Jun 06, 2014 at 15:31 UTC
    $final = 0 if ( $r );

    You need to change $r to $something =~ $r; Using a regex variable as an expression by itself will not do any matching.

    It might be more efficient to drop the loop and build a combined expression like

    $something !~ /$r1|$r2|$r3|$r4|.../;
    Maybe Regexp::Optimizer can make that long expression even faster.

Re: grep with looped tests
by LanX (Saint) on Jun 06, 2014 at 15:46 UTC
    > return shouldn't be the correct mechanism to tell grep true/false.

    One quirk in Perl is that blocks of grep and map are (unfortunately) not anonymous subs.

    Such that return tries to exit any surrounding sub and not the block!

    Though you are able to either pass a real sub:

    > perl my @x=1..10; my @re=qw/ 3 7 1 /; sub check { for my $re (@re) { return 1 if $_ =~ /$re/ } return 0; } print grep &check, @x; __END__ 13710

    update

    see also any in List::MoreUtils if you dont wanna reinvent the wheel! =)

    Cheers Rolf

    (addicted to the Perl Programming Language)