clerew has asked for the wisdom of the Perl Monks concerning the following question:

I have a file with lines of the form

/dev/sda4 6 Fri Apr 12 04:30:02 2019 +0100   /dumpx/DUMP4X/var/level6/55    86.53 MB

and I have another file with lines of the form

/dumpx/DUMP4X/var/level6/55

and I would like to find (and remove) all the lines in the first file which contain the strings in the second file. Fortunately, I can assume that both files are sorted in the same order.

to find a single entry is simple:

if $line =~ m?/dumpx/DUMP4X/var/level6/55? then do something

but what I want to do is to read lines from the two files in a suitable loop, picking out the matches, so I need a $regexp variable

So I write

$regexp = "m?" . <STDIN> . "?";

(the second file is actually coming from a pipe)

and then I test

if $line =~ $regexp ...

and if it matches, then I do the necessary stuff, and fetch the next $line and obtain the next $regexp.

But the =~ operator has been cunningly designed so that does not work. Essentially, if the RHS of =~ is a variable (my $regexp), then there is a builtin assumption that it uses '/' delimiters which, for my case, are totally unsuitable.

So how do I do this job?

Replies are listed 'Best First'.
Re: Regex variables with delimiters
by AnomalousMonk (Archbishop) on May 18, 2019 at 01:08 UTC
    $regexp = "m?" . <STDIN> . "?";

    Also be aware that  m?pattern? or  ?pattern? is a special case of the match operator:

    m?PATTERN?msixpodualgc
    ?PATTERN?msixpodualgc
    This is just like the "m/PATTERN/" search, except that it
    matches only once between calls to the reset() operator. ...
    See  m?...? in Regexp Quote-Like Operators in perlop.

    Update: See also  qr// in the above section of perlop for more info on interpolating strings into regexes. And see also perlre, perlretut, and perlrequick.


    Give a man a fish:  <%-{-{-{-<

Re: Regex variables with delimiters
by AnomalousMonk (Archbishop) on May 18, 2019 at 01:26 UTC
    ... the =~ operator has been cunningly designed so that ... there is a builtin assumption that it uses '/' delimiters ...

    A counterexample:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $raw = '/this/stuff/'; print 'raw: yes' if 'does/this/stuff/match?' =~ $raw; ;; my $regex = qr/$raw/; print 'regex: yes' if 'and/this/stuff/also?' =~ $regex; " raw: yes regex: yes


    Give a man a fish:  <%-{-{-{-<

      Yes that works, but my copy pf the O'Reilly Camel book seems to suggest otherwise :-(.

      Anway, after a Good Night's sleep I worked out how I should have done it:

      $regexp = <STDIN>

      probably with a chomp somewhere, and then

      $line =~ m?$regexp?

      That seems to work fine, so thanks to all who replied.

      Additional remark on May 20th

      actually, that didn't work. Due to the special peculiarities of m?xxxxx? matches, you need to call reset whenever you change $regexp. Much simpler just to use a different delimiter, so here is my final versions, now working as intended:

      $line =~ m#$regexp#
        ...  $line =~ m?$regexp? ... That seems to work fine ...

        Again,  m?...? is a special case of the  m// operator (per this), but if you're happy with its one-match-only behavior, I'm happy too! :)


        Give a man a fish:  <%-{-{-{-<

Re: Regex variables with delimiters
by johngg (Canon) on May 18, 2019 at 11:22 UTC

    Perhaps you could read all of the lines from your second file, before starting to process the first, and construct a single regular expression with all of the strings you want to exclude.

    johngg@shiraz:~/perl/utils$ perl -Mstrict -Mwarnings -E ' open my $file2FH, q{<}, \ <<__EOD2__ or die $!; /dumpx/DUMP4X/var/level6/55 /dumpx/DUMP4X/var/level7/58 __EOD2__ my $rxExcl = do { chomp( my @lines = <$file2FH> ); local $" = q{|}; qr{@lines}; }; close $file2FH or die $!; open my $file1FH, q{<}, \ <<__EOD1__ or die $!; /dev/sda3 6 Fri Apr 12 04:27:19 2019 +0100 /dumpx/DUMP3X/var/level6/ +47 81.34 MB /dev/sda4 6 Fri Apr 12 04:30:02 2019 +0100 /dumpx/DUMP4X/var/level6/ +55 86.53 MB /dev/sdb1 6 Fri Apr 12 04:31:47 2019 +0100 /dumpx/DUMP4X/var/level6/ +56 27.73 MB /dev/sdb2 6 Fri Apr 12 04:33:33 2019 +0100 /dumpx/DUMP4X/var/level7/ +58 57.32 MB __EOD1__ while ( <$file1FH> ) { print unless m{$rxExcl}; } close $file1FH or die $!;' /dev/sda3 6 Fri Apr 12 04:27:19 2019 +0100 /dumpx/DUMP3X/var/level6/ +47 81.34 MB /dev/sdb1 6 Fri Apr 12 04:31:47 2019 +0100 /dumpx/DUMP4X/var/level6/ +56 27.73 MB

    I hope this is useful.

    Cheers,

    JohnGG

Re: Regex variables with delimiters
by tybalt89 (Monsignor) on May 18, 2019 at 00:43 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=11100208 use strict; use warnings; my $line = "/dev/sda4 6 Fri Apr 12 04:30:02 2019 +0100 /dumpx/DUMP4X +/var/level6/55 86.53 MB\n"; my $secondline = "/dumpx/DUMP4X/var/level6/55\n"; my $regex = quotemeta $secondline =~ tr/\n//dr; if( $line =~ $regex ) { print "matches\n"; } else { print "does not match\n"; }
Re: Regex variables with delimiters
by hippo (Archbishop) on May 18, 2019 at 09:01 UTC

    The "patterns" in your second file which you are using to match the lines in your first file are not regexen - they are just strings. To look for a string inside another string use index.

Re: Regex variables with delimiters
by jwkrahn (Abbot) on May 18, 2019 at 00:38 UTC
    $regexp = "m?" . <STDIN> . "?";

    This won't work because STDIN reads a line which contains a newline at the end which won't match your example.

    chomp( my $input = <STDIN> ); $regexp = qr/$input/; if $line =~ ?$regexp? ...
      if $line =~ ?$regexp?

      But see the caution about  ?...? here.


      Give a man a fish:  <%-{-{-{-<

Re: Regex variables with delimiters
by Jack_Langsdorf (Initiate) on May 19, 2019 at 17:58 UTC
    I would suggest that you not use perl. grep has built in switches to handle exactly this case.

     grep -v -f file_of_filters logfile > filtered_logfile

      I would suggest that you not use perl. grep has built in switches to handle exactly this case.

      Except, of course, in the very common case that this code is part of a larger script or chain of processing, in which case calling grep as an external program has more disadvantages than just doing everything in Perl.