in reply to Re^2: How do I display only matches
in thread (SOLVED) How do I display only matches

Nothing is done with | No change is made to the default value of  $/ (the input record separator; see perlvar), so readline (the  <$fh> expression) is reading newline-terminated lines. Then chomp removes the  $/ sequence (the newline) from each line. The  /.{20}\\[a-zA-Z]\s[\r\n]/ regex requires a  [\r\n] (carriage-return or newline) character to match, but chomp has removed the newline, and I doubt there is a  \r present with which to match in text that seems to come from a Windows directory listing (update: see this for a more thorough discussion of this point).

c:\@Work\Perl\monks>perl -wMstrict -le "print 'match with \r' if qq{ Directory of D:\\ \\Q\\X \r} =~ /.{20}\\ +[a-zA-Z]\s[\r\n]/; print 'match sans \r' if qq{ Directory of D:\\ \\Q\\X } =~ /.{20}\\ +[a-zA-Z]\s[\r\n]/; " match with \r

Update: Beyond that, the
    next if $row =~ /REGEX/;
statement will skip printing if there is a match. You seem to want to print the line if there is a match, so something like
    next unless $row =~ /REGEX/;
still seems the way to go (once you get the regex right :) (Update: See also Marshall's reply. It seems like really good advice, although I see no need to stringize  "$line" when  $line is already a string read from a file.)


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^4: How do I display only matches (updated)
by BillKSmith (Monsignor) on Sep 24, 2019 at 20:22 UTC
    I know it is somewhat off-topic, but allow me to discuss newlines on windows. Windows disk files use the pair of ASCII characters CR,LF as a newline. When we run perl on windows, the IO layer 'crlf' is active by default (ref open}. On input, this layer translates the CR,LF to a single \n character. On output, it translates perl's \n to the CR,LF pair and writes both to the disk (ref doc:\\perlIO--link available in open). There is no \r for a regex to find. The binmode function is provided to turn off this processing when we really need the \r's. (The easiest way to process windows files on UNIX is to explicitly specify this layer in the open statement.)
    Bill
      Hi BillKSmith,

      I can see that there is a need for a FAQ on line endings.
      I'd be willing to cooperate with you on such a project?

      There are various cases involving reading a Unix file on Windows, reading Windows file on Unix.
      Other demos should include that <CR><LF> as standard network protocol.
      I have never used binmode to account for \r (<CR>). there are other ways.

      Any combine our ways together to make a FAQ?

      Update: I see that there are some detailed explanations of :CRLF layer, etc.
      A simple "how to" for reading and writing Mac,Unix,PC files on any of the platforms could be helpful.

        A FAQ sounds like a good idea to me. I have no idea of the mechanics of how to create one. A FAQ is usually little more than an example of "how to...". That seems to be appropriate in this case. The more detailed discussion that you proposed belongs somewhere else, perhaps a tutorial.

        I suspect that Perl's concept of 'layers'(with appropriate defaults) works so well that most of us are not even aware that it exists. I had used perl for over fifteen years before I even read 'PerlIO'. (I still have not read all the documentation for open.) The issue came up when Laurent_R and I were helping someone on the old PerlGuru.com. We could use his help.

        Bill
Re^4: How do I display only matches (updated)
by Marshall (Canon) on Sep 24, 2019 at 05:39 UTC
    although I see no need to stringize "$line" when $line is already a string read from a file

    Correct, in this case it doesn't matter. However since print can take an optional file handle print $fh $data there are some cases where print can be confused about what the first token means (filehandle or something to print to a filehandle). I coded what I knew would work rather than the minimal formulation. I didn't take any time worrying about this detail. My main point as you observed was: "make the regex as complicated as it needs to be, but no more"!

    I will point out that the Regex character "$" solves the platform specific line ending \r\n vs \n vs no line ending problem. "$" matches the the end of the string (or before newline at the end of the string; or before any newline if /m is used). I think \Z is the same. So /abc$/ matches lines ending in abc whether there is a line ending there or not.

      I will point out that the Regex character "$" solves the platform specific line ending \r\n vs \n vs no line ending problem.

      No, $ and \Z only work with \n, not \r:

      use warnings; use strict; use Data::Dump; for my $str ( "foo", "foo\n", "foo\r\n" ) { dd $str, scalar $str=~/o$/; } __END__ ("foo", 1) ("foo\n", 1) ("foo\r\n", "")

      Output is the same on Windows and *NIX.

        I ran your code on my Windows machine and got the same results. Hooray! This is exactly as expected!!!

        I copied the text for "$" verbatim from the Perl regex docs. Normally that is good enough. However in this case I see that some further "yeah but" explanation is required!

        Problem 1: What "\n" is can be both platform and sometimes context dependent! If I write a "\n" on my Windows machine, that means 2 characters: <CR><LF> (Carriage Return, Line Feed). So when you write "foo\r\n", on Windows that means <CR><CR><LF>. This extra <CR> means that the line doesn't end in "\n", <CR><LF> (Carriage Return, Line Feed). There is indeed something else between the "o" and the line ending and your regex doesn't match - this is correct behavior.

        You may not know this (many folks don't), but no matter what the OS platform, when writing to a network socket, "\n" means <CR><LF>. <CR><LF> is the network standard for line endings. So, yes, even on Unix, a write to network socket will be <CR><LF>, while a write to a disk file will be just <LF>. Windows uses the network standard for disk writes - so everywhere on Windows \n means the 2 characters <CR><LF>.

        Problem 2: Not every cross platform case and every platform direction is handled automatically by Perl. If you are on a single platform, then "$" will work as the docs describe as will chomp(). I have one program that needs to work on: a) old Mac "\n" means <CR> in files, b)Windows "\n" means <CR><LF> in files, c)Unix, "\n" means <LF> in files. When I write code that has to work with all 3 platforms, I use regex instead of chomp to delete the line endings. s/\s*$//; deletes all whitespace at the end of the line (including line endings like <CR><LF> which are considered "whitespace".

        Another thought: I told the OP that there was no need to "chomp" if you are just going to add the line ending back in. That is true as long as you are processing a file and writing a file for the same platform. There are some cases where you'd "chomp" and then print "$_\n" to change the line endings.

        I hope this post adds more clarity to the issue. But it probably raises more "yeah, but what if..." questions than it answers. This is all more complex than our OP asked about. I suggest starting a new thread if there is interest in discussing the "dirty details".

        Update: To allow for this <CR><CR><LF> situation:

        use warnings; use strict; use Data::Dump; for my $str ( "foo", "foo\n", "foo\r\n" ) { dd $str, scalar $str=~/o\s*$/; } __END__ ("foo", 1) ("foo\n", 1) ("foo\r\n", 1)