tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a file. I want that lines which match certain patterns remain rest dont. So I have an array which I will call grep_array. Each element of that array is a regexp So I can do
open file @some_array = <file> foreach $pattern (@matcharray) { grep {/$pattern/}@some_array }
Is there a more efficient way? @some_array can be very very large file. So first creating such a big array and then parsing it so many times looks inefficient. Moreover this will spoil the order. I want to find all lines which match the list of regexps.

Replies are listed 'Best First'.
Re: Usage of grep on a file
by GrandFather (Saint) on Apr 07, 2006 at 07:26 UTC

    Probably you want to combine Regexp::Assemble with inplace edit and do something like:

    use Regexp::Assemble; my $ra = Regexp::Assemble->new; $ra->add ($_) for @matchArray; my $re = $ra->re; @ARGV = ('filename'); $^I = '.bak'; while (<>) { print if m/$re/; }

    DWIM is Perl's answer to Gödel
      Thanks. If I dont want to use that module then do I do this? instead of that I can have
      $full_string = ""; foreach (@match_array) { $full_string = $full_string."|".$_; }
      then do a grep with $fullstring as the match pattern?

        If the file is large don't slurp it, use a loop to process a line at a time. You could do it this way:

        #open file #open outFile while (<file>) { my $line = $_; my $match = 0; for (@matcharray) { ($match = 1), last if $line =~ m/$_/; } print outFile $line if $match; }

        DWIM is Perl's answer to Gödel
        # ... which is same as ... $full_string = join '|' , @match_array;
Re: Usage of grep on a file
by Zaxo (Archbishop) on Apr 07, 2006 at 07:34 UTC

    Can we assume that each pattern matches within a single line? If so, then lets have @matcharray an array of qr// expressions, ordered by how often we expect a match - commonest first.

    That all means you can read the input file line by line and print that line to an output file if a match is found:

    my @matcharray = map {qr/$_/} ( "all", "your", "[patterns]", ); open my $fh, '<', '/path/to/original/file' or die $!; open my $of, '>', '/path/to/filtered/file' or die $!; { local $_; while (<$fh>) { for my $pat (@matcharray) { /$pat/ and print $of $_ and last; } } }
    Once a match is found, the line is printed to output (with the assumption that print succeeds) and further match tests are skipped.

    After Compline,
    Zaxo

Re: Usage of grep on a file
by rafl (Friar) on Apr 07, 2006 at 07:36 UTC

    First of all, use qr// to precompile your patterns:

    my @compiled_patterns = map { qx/$_/ } @matcharray;

    Also don't slurp the file into an array. Use something more memory-efficient:

    OUTER: while (my $line = <$file>) { INNER: for my $pattern (@compiled_patterns) { print $line, last INNER if $line =~ $pattern; } }

    Cheers, Flo

      Your text says "qr//" but your code says "qx//". Using qx// in this situation could prove very interesting! :-)
Re: Usage of grep on a file
by aquarium (Curate) on Apr 07, 2006 at 12:17 UTC
    first of all...unless your filename is always going to be hardcoded, use STDIN during input in the script, and re-direct input from the file or list multiple input files when calling the script,e.g.
    perl myscript <file
    or
    perl myscript file1 file2 file3
    perl automagically knows how to feed the script with the input from the file(s)
    then the code becomes
    my $regex = '(' . join(')|(',@matcharray) . ')'; print "$regex\n"; while(my $line=<>) { chomp $line; print "$line\n" if($line=~/$regex/); }
    the hardest line to type correctly is: stty erase ^H
Re: Usage of grep on a file
by ambrus (Abbot) on Apr 07, 2006 at 10:17 UTC

    Cgrep is my reimplementation of grep in perl.