in reply to (SOLVED) How do I display only matches

Your regex  /.{54}\\[a-zA-Z]\s[\r\n]/g (please use  <code> ... </code> tags for all code, data and input/output; please see Writeup Formatting Tips) requires a  [\r\n] to match, but  chomp $row; will remove a newline from the end of each line (if present). Are you sure that a match is possible? Are you depending on a match against  \r (carriage-return)? (Also, the  /g modifier on the  m// has no effect, although it does no harm.)


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: How do I display only matches
by tem2 (Novice) on Sep 24, 2019 at 03:12 UTC
    use strict; use warnings; my $filename = 'dirtest.txt'; open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; while (my $row = <$fh>) { chomp $row; next if $row =~ /.{20}\\[a-zA-Z]\s[\r\n]/; print "$row\n"; }

    The following is the file content:

    Directory of D:\ \Q\X 09/20/2019 07:57 PM <DIR> . 09/20/2019 07:57 PM <DIR> ..

    The regex works for this content in online regex https://regex101.com/r/LFrvLp/11 When I run the script, every row is displayed, it should only be the first row as it is the only row with a backslash in position 21.

      I think you may be over complicating things with a regex that is both more complicated and harder to understand than necessary?? I mean it looks like the file is a Windows dir listing? I would suggest:
      while (my $line = <$fh>) { print "$line" if $line =~ /^\s+Directory of/; }
      No need to chomp if you are just going to add the line ending back in. Forcing at least one space at the beginning of the line narrows things down a lot. Putting in "Directory of" makes it very easy to understand what line of this file you are actually looking for. Please correct me if your dataset if more complicated than you've shown.

      I would also add that in my work, keying a regex to a particular column number is usually a bad idea because counting the columns can be error prone and there can be some variance if the file could have been generated with "cut-n-paste". YMMV

        My problem is to detect directories which are out of order. The proper order of the tree should read as follows:

        Directory of D:\ \Q Directory of D:\ \R Directory of D:\ \S

        Due to operator mishandling, the directories can look more like this:

        12345678901234567890 Directory of D:\ \Q Directory of D:\ \Q\X Directory of D:\ \R Directory of D:\ \S

        I need to identify the anomalies like \Q\X. FINDSTR doesn't work, so I was able to come up with the regex \.{20}\\a-zA-Z\s\r\n/ There will always be 20 characters followed by a backslash, then a letter, then a space, and then a CRLF. The regex is accurate because it works in the online regex tester https://regex101.com/r/LFrvLp/11

        I have zero experience with perl scripting, I was looking for code to read a file and found the snippet with the "chomp" function. All I want is to read an input file and output a list of matches for the regex. Thank you for your assistance!

      Nothing is done with | No change is made to the default value of  $/ (the input record separator; see perlvar), so readline (the  <$fh> expression) is reading newline-terminated lines. Then chomp removes the  $/ sequence (the newline) from each line. The  /.{20}\\[a-zA-Z]\s[\r\n]/ regex requires a  [\r\n] (carriage-return or newline) character to match, but chomp has removed the newline, and I doubt there is a  \r present with which to match in text that seems to come from a Windows directory listing (update: see this for a more thorough discussion of this point).

      c:\@Work\Perl\monks>perl -wMstrict -le "print 'match with \r' if qq{ Directory of D:\\ \\Q\\X \r} =~ /.{20}\\ +[a-zA-Z]\s[\r\n]/; print 'match sans \r' if qq{ Directory of D:\\ \\Q\\X } =~ /.{20}\\ +[a-zA-Z]\s[\r\n]/; " match with \r

      Update: Beyond that, the
          next if $row =~ /REGEX/;
      statement will skip printing if there is a match. You seem to want to print the line if there is a match, so something like
          next unless $row =~ /REGEX/;
      still seems the way to go (once you get the regex right :) (Update: See also Marshall's reply. It seems like really good advice, although I see no need to stringize  "$line" when  $line is already a string read from a file.)


      Give a man a fish:  <%-{-{-{-<

        I know it is somewhat off-topic, but allow me to discuss newlines on windows. Windows disk files use the pair of ASCII characters CR,LF as a newline. When we run perl on windows, the IO layer 'crlf' is active by default (ref open}. On input, this layer translates the CR,LF to a single \n character. On output, it translates perl's \n to the CR,LF pair and writes both to the disk (ref doc:\\perlIO--link available in open). There is no \r for a regex to find. The binmode function is provided to turn off this processing when we really need the \r's. (The easiest way to process windows files on UNIX is to explicitly specify this layer in the open statement.)
        Bill
        although I see no need to stringize "$line" when $line is already a string read from a file

        Correct, in this case it doesn't matter. However since print can take an optional file handle print $fh $data there are some cases where print can be confused about what the first token means (filehandle or something to print to a filehandle). I coded what I knew would work rather than the minimal formulation. I didn't take any time worrying about this detail. My main point as you observed was: "make the regex as complicated as it needs to be, but no more"!

        I will point out that the Regex character "$" solves the platform specific line ending \r\n vs \n vs no line ending problem. "$" matches the the end of the string (or before newline at the end of the string; or before any newline if /m is used). I think \Z is the same. So /abc$/ matches lines ending in abc whether there is a line ending there or not.