tem2 has asked for the wisdom of the Perl Monks concerning the following question:

use strict; use warnings; my $filename = 'file.txt'; open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; while (my $row = <$fh>) { chomp $row; print "$row\n"; }

I'm not sure how to close the thread and I don't want to commit any mortal sins, so I'll edit my original post with closing remarks. When I turned on "show all symbols" in my input file, there was no space preceding the CRLF. All is well now. Thanks to everyone.

I am a beginner. The code above will display the contents of file.txt. I want to only display records which match the regular expression: /.{54}\\a-zA-Z\s\r\n/g Thank you for any assistance.

Replies are listed 'Best First'.
Re: How do I display only matches
by AnomalousMonk (Archbishop) on Sep 24, 2019 at 02:04 UTC

    Your regex  /.{54}\\[a-zA-Z]\s[\r\n]/g (please use  <code> ... </code> tags for all code, data and input/output; please see Writeup Formatting Tips) requires a  [\r\n] to match, but  chomp $row; will remove a newline from the end of each line (if present). Are you sure that a match is possible? Are you depending on a match against  \r (carriage-return)? (Also, the  /g modifier on the  m// has no effect, although it does no harm.)


    Give a man a fish:  <%-{-{-{-<

      use strict; use warnings; my $filename = 'dirtest.txt'; open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; while (my $row = <$fh>) { chomp $row; next if $row =~ /.{20}\\[a-zA-Z]\s[\r\n]/; print "$row\n"; }

      The following is the file content:

      Directory of D:\ \Q\X 09/20/2019 07:57 PM <DIR> . 09/20/2019 07:57 PM <DIR> ..

      The regex works for this content in online regex https://regex101.com/r/LFrvLp/11 When I run the script, every row is displayed, it should only be the first row as it is the only row with a backslash in position 21.

        I think you may be over complicating things with a regex that is both more complicated and harder to understand than necessary?? I mean it looks like the file is a Windows dir listing? I would suggest:
        while (my $line = <$fh>) { print "$line" if $line =~ /^\s+Directory of/; }
        No need to chomp if you are just going to add the line ending back in. Forcing at least one space at the beginning of the line narrows things down a lot. Putting in "Directory of" makes it very easy to understand what line of this file you are actually looking for. Please correct me if your dataset if more complicated than you've shown.

        I would also add that in my work, keying a regex to a particular column number is usually a bad idea because counting the columns can be error prone and there can be some variance if the file could have been generated with "cut-n-paste". YMMV

        Nothing is done with | No change is made to the default value of  $/ (the input record separator; see perlvar), so readline (the  <$fh> expression) is reading newline-terminated lines. Then chomp removes the  $/ sequence (the newline) from each line. The  /.{20}\\[a-zA-Z]\s[\r\n]/ regex requires a  [\r\n] (carriage-return or newline) character to match, but chomp has removed the newline, and I doubt there is a  \r present with which to match in text that seems to come from a Windows directory listing (update: see this for a more thorough discussion of this point).

        c:\@Work\Perl\monks>perl -wMstrict -le "print 'match with \r' if qq{ Directory of D:\\ \\Q\\X \r} =~ /.{20}\\ +[a-zA-Z]\s[\r\n]/; print 'match sans \r' if qq{ Directory of D:\\ \\Q\\X } =~ /.{20}\\ +[a-zA-Z]\s[\r\n]/; " match with \r

        Update: Beyond that, the
            next if $row =~ /REGEX/;
        statement will skip printing if there is a match. You seem to want to print the line if there is a match, so something like
            next unless $row =~ /REGEX/;
        still seems the way to go (once you get the regex right :) (Update: See also Marshall's reply. It seems like really good advice, although I see no need to stringize  "$line" when  $line is already a string read from a file.)


        Give a man a fish:  <%-{-{-{-<

Re: How do I display only matches
by LanX (Saint) on Sep 24, 2019 at 01:47 UTC
    put this between chomp and print

     next if $row =~ /REGEX/;

    Update

    Of course this must be inverted

    next unless $row =~ /REGEX/;

    Thanks anomalous monk

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Shouldn't that be
          next unless $row =~ /REGEX/;
      or maybe
          next if $row !~ /REGEX/;


      Give a man a fish:  <%-{-{-{-<

        use strict; use warnings; my $filename = 'dirtest.txt'; open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; while (my $row = <$fh>) { chomp $row; next if $row =~ /.{20}\\[a-zA-Z]\s[\r\n]/; print "$row\n"; }

        The following is the file content:

        Directory of D:\ \Q\X 09/20/2019 07:57 PM <DIR> . 09/20/2019 07:57 PM <DIR> ..

        The regex works for this content in online regex https://regex101.com/r/LFrvLp/11 But when I run the script, EVERY row is displayed, it should only be the FIRST row as it is the only row with a backslash in position 21.

Re: How do I display only matches
by Anonymous Monk on Sep 24, 2019 at 03:55 UTC
    autodie handles your errors with less effort. Also try open with encoding, esp if opening more than one file, for cleaner code:
    use strict; use warnings; use autodie; use open ':encoding(UTF-8)'; my $filename = 'dirtest.txt'; open my $fh, '<', $filename;