in reply to Re: How do I display only matches
in thread (SOLVED) How do I display only matches

use strict; use warnings; my $filename = 'dirtest.txt'; open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; while (my $row = <$fh>) { chomp $row; next if $row =~ /.{20}\\[a-zA-Z]\s[\r\n]/; print "$row\n"; }

The following is the file content:

Directory of D:\ \Q\X 09/20/2019 07:57 PM <DIR> . 09/20/2019 07:57 PM <DIR> ..

The regex works for this content in online regex https://regex101.com/r/LFrvLp/11 When I run the script, every row is displayed, it should only be the first row as it is the only row with a backslash in position 21.

Replies are listed 'Best First'.
Re^3: How do I display only matches
by Marshall (Canon) on Sep 24, 2019 at 04:50 UTC
    I think you may be over complicating things with a regex that is both more complicated and harder to understand than necessary?? I mean it looks like the file is a Windows dir listing? I would suggest:
    while (my $line = <$fh>) { print "$line" if $line =~ /^\s+Directory of/; }
    No need to chomp if you are just going to add the line ending back in. Forcing at least one space at the beginning of the line narrows things down a lot. Putting in "Directory of" makes it very easy to understand what line of this file you are actually looking for. Please correct me if your dataset if more complicated than you've shown.

    I would also add that in my work, keying a regex to a particular column number is usually a bad idea because counting the columns can be error prone and there can be some variance if the file could have been generated with "cut-n-paste". YMMV

      My problem is to detect directories which are out of order. The proper order of the tree should read as follows:

      Directory of D:\ \Q Directory of D:\ \R Directory of D:\ \S

      Due to operator mishandling, the directories can look more like this:

      12345678901234567890 Directory of D:\ \Q Directory of D:\ \Q\X Directory of D:\ \R Directory of D:\ \S

      I need to identify the anomalies like \Q\X. FINDSTR doesn't work, so I was able to come up with the regex \.{20}\\a-zA-Z\s\r\n/ There will always be 20 characters followed by a backslash, then a letter, then a space, and then a CRLF. The regex is accurate because it works in the online regex tester https://regex101.com/r/LFrvLp/11

      I have zero experience with perl scripting, I was looking for code to read a file and found the snippet with the "chomp" function. All I want is to read an input file and output a list of matches for the regex. Thank you for your assistance!

Re^3: How do I display only matches (updated)
by AnomalousMonk (Archbishop) on Sep 24, 2019 at 04:27 UTC

    Nothing is done with | No change is made to the default value of  $/ (the input record separator; see perlvar), so readline (the  <$fh> expression) is reading newline-terminated lines. Then chomp removes the  $/ sequence (the newline) from each line. The  /.{20}\\[a-zA-Z]\s[\r\n]/ regex requires a  [\r\n] (carriage-return or newline) character to match, but chomp has removed the newline, and I doubt there is a  \r present with which to match in text that seems to come from a Windows directory listing (update: see this for a more thorough discussion of this point).

    c:\@Work\Perl\monks>perl -wMstrict -le "print 'match with \r' if qq{ Directory of D:\\ \\Q\\X \r} =~ /.{20}\\ +[a-zA-Z]\s[\r\n]/; print 'match sans \r' if qq{ Directory of D:\\ \\Q\\X } =~ /.{20}\\ +[a-zA-Z]\s[\r\n]/; " match with \r

    Update: Beyond that, the
        next if $row =~ /REGEX/;
    statement will skip printing if there is a match. You seem to want to print the line if there is a match, so something like
        next unless $row =~ /REGEX/;
    still seems the way to go (once you get the regex right :) (Update: See also Marshall's reply. It seems like really good advice, although I see no need to stringize  "$line" when  $line is already a string read from a file.)


    Give a man a fish:  <%-{-{-{-<

      I know it is somewhat off-topic, but allow me to discuss newlines on windows. Windows disk files use the pair of ASCII characters CR,LF as a newline. When we run perl on windows, the IO layer 'crlf' is active by default (ref open}. On input, this layer translates the CR,LF to a single \n character. On output, it translates perl's \n to the CR,LF pair and writes both to the disk (ref doc:\\perlIO--link available in open). There is no \r for a regex to find. The binmode function is provided to turn off this processing when we really need the \r's. (The easiest way to process windows files on UNIX is to explicitly specify this layer in the open statement.)
      Bill
        Hi BillKSmith,

        I can see that there is a need for a FAQ on line endings.
        I'd be willing to cooperate with you on such a project?

        There are various cases involving reading a Unix file on Windows, reading Windows file on Unix.
        Other demos should include that <CR><LF> as standard network protocol.
        I have never used binmode to account for \r (<CR>). there are other ways.

        Any combine our ways together to make a FAQ?

        Update: I see that there are some detailed explanations of :CRLF layer, etc.
        A simple "how to" for reading and writing Mac,Unix,PC files on any of the platforms could be helpful.

      although I see no need to stringize "$line" when $line is already a string read from a file

      Correct, in this case it doesn't matter. However since print can take an optional file handle print $fh $data there are some cases where print can be confused about what the first token means (filehandle or something to print to a filehandle). I coded what I knew would work rather than the minimal formulation. I didn't take any time worrying about this detail. My main point as you observed was: "make the regex as complicated as it needs to be, but no more"!

      I will point out that the Regex character "$" solves the platform specific line ending \r\n vs \n vs no line ending problem. "$" matches the the end of the string (or before newline at the end of the string; or before any newline if /m is used). I think \Z is the same. So /abc$/ matches lines ending in abc whether there is a line ending there or not.

        I will point out that the Regex character "$" solves the platform specific line ending \r\n vs \n vs no line ending problem.

        No, $ and \Z only work with \n, not \r:

        use warnings; use strict; use Data::Dump; for my $str ( "foo", "foo\n", "foo\r\n" ) { dd $str, scalar $str=~/o$/; } __END__ ("foo", 1) ("foo\n", 1) ("foo\r\n", "")

        Output is the same on Windows and *NIX.