in reply to Re^6: How do I display only matches
in thread (SOLVED) How do I display only matches

If I write a "\n" on my Windows machine, that means 2 characters: <CR><LF> (Carriage Return, Line Feed). So when you write "foo\r\n", on Windows that means <CR><CR><LF>. This extra <CR> means that the line doesn't end in "\n", <CR><LF> (Carriage Return, Line Feed). There is indeed something else between the "o" and the line ending and your regex doesn't match

I've seen you say something along these lines a couple times before, and I'm sorry, but it's flat out wrong.

C:\>perl -wMstrict -MDevel::Peek -e "my$x=qq{\n};Dump($x)" SV = PV(0x32aa98) at 0x577ef8 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x556ba4 "\n"\0 CUR = 1 LEN = 10 COW_REFCNT = 1

Note CUR = 1 - there is only one character in that string, not two. I dimly remember that there might have been some builds of Perl for Windows back in the 90's that may have tried to handle it differently, and I remember being confused by this early on as well, but for a long time now, on both Windows and *NIX, \n is LF, and that's what Perl uses internally. The difference is that on Windows the :crlf layer translates that on input and output, but it doesn't change the internal representation, how regexes work, or the default $/ - even on Windows, it's "\n", one byte, "\x0A".

I suggest you take the time to read Newlines in perlport and PerlIO.

I copied the text for "$" verbatim from the Perl regex docs.

Where in the Perl docs does it say $ matches before \r?

Replies are listed 'Best First'.
Re^8: How do I display only matches
by Marshall (Canon) on Sep 25, 2019 at 21:58 UTC
    Let's start with some basics:
    use warnings; use strict; open (my $file, '>', "testendings") or die "unable to open testendings for write! $!"; print $file "test\n";
    The file testendings contains binary(ok, Hex)
    746573740D0A
    0D is CR and 0A is LF.

    Can you replicate this and agree with it on Windows?

      Can you replicate this and agree with it on Windows?

      Yes, of course, as I said, that's because on Windows, the :crlf layer is active by default. The single byte 0A in the string "test\x0A" is being translated by that layer to 0D0A on output, but the internal representation of that string is still just those five bytes, not six ("test\x0D\x0A" or "test\r\n") as you claimed earlier.

      It's all explained fairly well in Newlines in perlport and PerlIO. I suggest you take the time to read and understand that, and test the facts I've already shown for yourself, before we discuss further.

        Ok.
        On Windows, when you print "\n", there will be 2 characters, 0x0D 0x0A.
        Verified using a binary editor.
        When reading said text file under Windows, with standard text file I/O, Perl will remove the 0x0D (<CR>) character. That happens as part of the I/O layer and is invisible to the Perl user program.
        I had thought that "\n" and "\n" had the same meaning whether write or read. It turns out that is NOT true.

        Some Windows test code:

        use warnings; use strict; open (my $file, '>', "testendings") or die "unable to open testendings for write! $!"; print $file "test\n"; close $file; print "Binary file created: 74 65 73 74 0D 0A\n"; open (my $infile, '<', "testendings") or die "unable to open testendings for read! $!"; my $in =<$infile>; close $infile; print "input as read by normal string IO: $in"; print "Length in bytes of input var as read is: ".length($in)," bytes\ +n"; $in =~ s/(.)/sprintf("%02X ",ord($1))/seg; print "$in\n"; print "note: when using Perl text read, the 0x0D was deleted!\n"; __END__ Binary file created: 74 65 73 74 0D 0A # 0D=>CR 0A=>LF input as read by normal string IO: test Length in bytes of input var as read is: 5 bytes 74 65 73 74 0A note: when using text read, the 0x0D was deleted!
        My explanation of why "\r\n" doesn't work is functionally correct, but incorrect in the dirty details of Why?. On Windows this will print <CR><CR><LF>. When read back via text mode, only one <CR> will be deleted. The regex fails because there is still another <CR> there and "$" is looking for a 0x0A. Correct?