in reply to Re^9: How do I display only matches
in thread (SOLVED) How do I display only matches

Ok.
On Windows, when you print "\n", there will be 2 characters, 0x0D 0x0A.
Verified using a binary editor.
When reading said text file under Windows, with standard text file I/O, Perl will remove the 0x0D (<CR>) character. That happens as part of the I/O layer and is invisible to the Perl user program.
I had thought that "\n" and "\n" had the same meaning whether write or read. It turns out that is NOT true.

Some Windows test code:

use warnings; use strict; open (my $file, '>', "testendings") or die "unable to open testendings for write! $!"; print $file "test\n"; close $file; print "Binary file created: 74 65 73 74 0D 0A\n"; open (my $infile, '<', "testendings") or die "unable to open testendings for read! $!"; my $in =<$infile>; close $infile; print "input as read by normal string IO: $in"; print "Length in bytes of input var as read is: ".length($in)," bytes\ +n"; $in =~ s/(.)/sprintf("%02X ",ord($1))/seg; print "$in\n"; print "note: when using Perl text read, the 0x0D was deleted!\n"; __END__ Binary file created: 74 65 73 74 0D 0A # 0D=>CR 0A=>LF input as read by normal string IO: test Length in bytes of input var as read is: 5 bytes 74 65 73 74 0A note: when using text read, the 0x0D was deleted!
My explanation of why "\r\n" doesn't work is functionally correct, but incorrect in the dirty details of Why?. On Windows this will print <CR><CR><LF>. When read back via text mode, only one <CR> will be deleted. The regex fails because there is still another <CR> there and "$" is looking for a 0x0A. Correct?

Replies are listed 'Best First'.
Re^11: How do I display only matches (updated)
by haukex (Archbishop) on Sep 26, 2019 at 06:04 UTC
    On Windows, when you print "\n", there will be 2 characters, 0x0D 0x0A. ... I had thought that "\n" and "\n" had the same meaning whether write or read. It turns out that is NOT true.

    As I've said several times now, to be completely accurate (which is important here), on Windows and *NIX, the Perl string "\n" means "\x0A" (I think Perl could be complied differently, but I'm not aware of any current builds that actually do this). What gets written or read depends on which PerlIO layers are in effect. This can even be changed dynamically while a handle is open using binmode, and it works the same on *NIX and Windows, except that :crlf is one of the defaults on Windows. To check which layers are in fact in effect, use my @layers = PerlIO::get_layers($handle);

    On Windows this will print <CR><CR><LF>. When read back via text mode, only one <CR> will be deleted. The regex fails because there is still another <CR> there and "$" is looking for a 0x0A. Correct?

    Correct, yes - I would nitpick that in the examples I showed, there is no reading/writing going on, so there is no need to think about what translations might be happening. Once a string has been read into Perl (and its contents verified with a dumper module), regexes behave the same on both platforms, which is where this subthread started.

    Update: Also, note that "text mode" is somewhat misleading: technically, there is just the :crlf layer, which can either be active or not. Again, see PerlIO (and binmode).

      As I've said several times now, to be completely accurate (which is important here), on Windows and *NIX, the Perl string "\n" means "\x0A"

      That is not completely correct.
      When using Perl's default I/O layer, print "\n" will emit <LF> on Unix and <CR><LF> on Windows.

      When reading a text line (Unix or Windows), the <CR> will be deleted, if it exists.
      Update: When using "standard default I/O methods"

        When using Perl's default I/O layer, print "\n" will emit <LF> on Unix and <CR><LF> on Windows.

        Yes, you're just repeating back to me what I said several times now.

        When reading a text line (Unix or Windows), the <CR> will be deleted, if it exists.

        For *NIX that is once again wrong.

        $ hexdump -C test.txt 00000000 46 6f 6f 0d 0a |Foo..| $ perl -wMstrict -MData::Dump -e 'dd <>' test.txt "Foo\r\n"
        Update: When using "standard default I/O methods"

        But that's not the only thing this discussion iswas about. It's also about you insisting that (essentially) $ matches before \r, insisting that (essentially) "\n" eq "\r\n" and that somehow all network communication magically uses CRLF, or stuff like the above. Sorry, but all the evidence appears to point to a fundamental misunderstanding of the topic.

        That is not completely correct.

        Are you just trolling now? You clearly still haven't studied the subject matter enough, I'm not sure if you're even reading my posts, and I'm tired of repeating myself and running tests that you could be running, so for now, I'm out.