Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^9: How do I display only matches

by haukex (Archbishop)
on Sep 25, 2019 at 22:08 UTC ( [id://11106706]=note: print w/replies, xml ) Need Help??


in reply to Re^8: How do I display only matches
in thread (SOLVED) How do I display only matches

Can you replicate this and agree with it on Windows?

Yes, of course, as I said, that's because on Windows, the :crlf layer is active by default. The single byte 0A in the string "test\x0A" is being translated by that layer to 0D0A on output, but the internal representation of that string is still just those five bytes, not six ("test\x0D\x0A" or "test\r\n") as you claimed earlier.

It's all explained fairly well in Newlines in perlport and PerlIO. I suggest you take the time to read and understand that, and test the facts I've already shown for yourself, before we discuss further.

Replies are listed 'Best First'.
Re^10: How do I display only matches
by Marshall (Canon) on Sep 26, 2019 at 00:52 UTC
    Ok.
    On Windows, when you print "\n", there will be 2 characters, 0x0D 0x0A.
    Verified using a binary editor.
    When reading said text file under Windows, with standard text file I/O, Perl will remove the 0x0D (<CR>) character. That happens as part of the I/O layer and is invisible to the Perl user program.
    I had thought that "\n" and "\n" had the same meaning whether write or read. It turns out that is NOT true.

    Some Windows test code:

    use warnings; use strict; open (my $file, '>', "testendings") or die "unable to open testendings for write! $!"; print $file "test\n"; close $file; print "Binary file created: 74 65 73 74 0D 0A\n"; open (my $infile, '<', "testendings") or die "unable to open testendings for read! $!"; my $in =<$infile>; close $infile; print "input as read by normal string IO: $in"; print "Length in bytes of input var as read is: ".length($in)," bytes\ +n"; $in =~ s/(.)/sprintf("%02X ",ord($1))/seg; print "$in\n"; print "note: when using Perl text read, the 0x0D was deleted!\n"; __END__ Binary file created: 74 65 73 74 0D 0A # 0D=>CR 0A=>LF input as read by normal string IO: test Length in bytes of input var as read is: 5 bytes 74 65 73 74 0A note: when using text read, the 0x0D was deleted!
    My explanation of why "\r\n" doesn't work is functionally correct, but incorrect in the dirty details of Why?. On Windows this will print <CR><CR><LF>. When read back via text mode, only one <CR> will be deleted. The regex fails because there is still another <CR> there and "$" is looking for a 0x0A. Correct?
      On Windows, when you print "\n", there will be 2 characters, 0x0D 0x0A. ... I had thought that "\n" and "\n" had the same meaning whether write or read. It turns out that is NOT true.

      As I've said several times now, to be completely accurate (which is important here), on Windows and *NIX, the Perl string "\n" means "\x0A" (I think Perl could be complied differently, but I'm not aware of any current builds that actually do this). What gets written or read depends on which PerlIO layers are in effect. This can even be changed dynamically while a handle is open using binmode, and it works the same on *NIX and Windows, except that :crlf is one of the defaults on Windows. To check which layers are in fact in effect, use my @layers = PerlIO::get_layers($handle);

      On Windows this will print <CR><CR><LF>. When read back via text mode, only one <CR> will be deleted. The regex fails because there is still another <CR> there and "$" is looking for a 0x0A. Correct?

      Correct, yes - I would nitpick that in the examples I showed, there is no reading/writing going on, so there is no need to think about what translations might be happening. Once a string has been read into Perl (and its contents verified with a dumper module), regexes behave the same on both platforms, which is where this subthread started.

      Update: Also, note that "text mode" is somewhat misleading: technically, there is just the :crlf layer, which can either be active or not. Again, see PerlIO (and binmode).

        As I've said several times now, to be completely accurate (which is important here), on Windows and *NIX, the Perl string "\n" means "\x0A"

        That is not completely correct.
        When using Perl's default I/O layer, print "\n" will emit <LF> on Unix and <CR><LF> on Windows.

        When reading a text line (Unix or Windows), the <CR> will be deleted, if it exists.
        Update: When using "standard default I/O methods"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11106706]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-03-28 22:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found