^M chars in output file

locust has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: ^M chars in output file by roboticus (Chancellor) on Nov 20, 2010 at 04:45 UTC
locust: Use `binmode` on your file handles so you'll only get the "\r" characters when you want to write them them. Also, for trimming the end of lines, I typically use `s/\s+$//`, because it'll remove all of the junk at the end that I don't care about. ...roboticus	[reply] [d/l] [select]
Re^2: ^M chars in output file by locust (Sexton) on Nov 22, 2010 at 15:57 UTC
roboticus: This worked, however, it's important to point out that one must use binmode on both the input AND output file. This drove me nuts for awhile until I put the obvious together. After reading the doc on `binmode` I gathered that Perl will automatically put the appropriate new line char at the end of the line for os your using. Thanks	[reply] [d/l]
Re: ^M chars in output file by johngg (Canon) on Nov 20, 2010 at 13:16 UTC
You could set the input record separator (see `$INPUT_RECORD_SEPARATOR` or `$/` in perlvar) to `"\r\n"`, use chomp to remove the separator and then add the `\n` at the output stage. Alternatively, open the file in `<:crlf` mode which will remove the `CR` for you. In the code below I use split, ord and sprintf to show the individual characters as read and after `chomp`ing. use strict; use warnings; open my $outFH, q{>}, \ my $dataFile or die qq{open: > scalar ref.: $!\n}; print $outFH qq{1,2,3\r\n}, qq{4,5,6\r\n}, qq{7,8,9\r\n}; close $outFH or die qq{close: > scalar ref.: $!\n}; { print qq{\nSetting input record separator to CRLF\n}; local $/ = qq{\r\n}; open my $inFH, q{<}, \ $dataFile or die qq{open: < scalar ref.: $!\n}; while( <$inFH> ) { print qq{Line $.\n}; print qq{ Original: }, qq{@{ [ map sprintf( q{%#.2x}, ord ), split m{} ] }\n}; chomp; print qq{ Chomped: }, qq{@{ [ map sprintf( q{%#.2x}, ord ), split m{} ] }\n}; } close $inFH or die qq{close: < scalar ref.: $!\n}; } print qq{\nOpening file in "<:crlf" mode\n}; open my $inFH, q{<:crlf}, \ $dataFile or die qq{open: < scalar ref.: $!\n}; while( <$inFH> ) { print qq{Line $.\n}; print qq{ Original: }, qq{@{ [ map sprintf( q{%#.2x}, ord ), split m{} ] }\n}; chomp; print qq{ Chomped: }, qq{@{ [ map sprintf( q{%#.2x}, ord ), split m{} ] }\n}; } close $inFH or die qq{close: < scalar ref.: $!\n}; [download] The output. Setting input record separator to CRLF Line 1 Original: 0x31 0x2c 0x32 0x2c 0x33 0x0d 0x0a Chomped: 0x31 0x2c 0x32 0x2c 0x33 Line 2 Original: 0x34 0x2c 0x35 0x2c 0x36 0x0d 0x0a Chomped: 0x34 0x2c 0x35 0x2c 0x36 Line 3 Original: 0x37 0x2c 0x38 0x2c 0x39 0x0d 0x0a Chomped: 0x37 0x2c 0x38 0x2c 0x39 Opening file in "<:crlf" mode Line 1 Original: 0x31 0x2c 0x32 0x2c 0x33 0x0a Chomped: 0x31 0x2c 0x32 0x2c 0x33 Line 2 Original: 0x34 0x2c 0x35 0x2c 0x36 0x0a Chomped: 0x34 0x2c 0x35 0x2c 0x36 Line 3 Original: 0x37 0x2c 0x38 0x2c 0x39 0x0a Chomped: 0x37 0x2c 0x38 0x2c 0x39 [download] I hope this is helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re: ^M chars in output file by planetscape (Chancellor) on Nov 20, 2010 at 14:03 UTC
See also flip, dos2unix, etc. HTH, planetscape	[reply]
Re: ^M chars in output file by viveksnv (Sexton) on Nov 20, 2010 at 05:11 UTC
Hi My experience is, ^M lines at end of ines may be that file was created as a DOS format file and you are seeing in Unix environment. Did you try dos2unix command in linux	[reply]
Re^2: ^M chars in output file by codeacrobat (Chaplain) on Nov 20, 2010 at 13:02 UTC
Or just remove all ^M with a one-liner `perl -pi -e 'tr[\r][]d' file` [download] `print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});`	[reply] [d/l] [select]
Re: ^M chars in output file by chrestomanci (Priest) on Nov 20, 2010 at 21:12 UTC
The line endings of files generated by perl will match the standard for the operating system that you run perl under. So if you run your perl script under windows, and then examine the output with unix/linux, you will see extra ^M Line endings. Try a simple test program `#! perl open OUTFILE, '>', 'test_out.txt' or die "Error writing to test_out.tx +t $!"; print OUTFILE "Hello\n"; print OUTFILE "world\n"; close OUTFILE;` [download] You should find with the script above that if you run it under windows, and then open it under unix, you should see those same ^M line endings. To see what is going on, open the output in a hex editor. The file generated under windows should look something like this: `00000000 48 65 6C 6C 6F 0D 0A 77 6F - 72 6C 64 0D 0A Hello..world..` [download] While the file generated under unix will look like: `00000000 48 65 6C 6C 6F 0A 77 6F - 72 6C 64 0A Hello.world.` [download] Note that the file from windows has two return chars (`0D 0A`), where as the unix one has just one (`0A`). This illustrates how the different platforms have different new line codes. If you open a file generated on one platform using a dumb editor on a different one, then you will see artefacts from the difference in return codes, for example if you open the unix output file using windows notepad, you won't see a newline between hello and world. (Smarter editors usually detect the difference in line ending and automatically do the right thing for you.) In other words, the ^M codes you are seeing are nothing to do with your program, or how you it is loading data, but come from the platform you are running your program under, and the platform which generated it's input data. If you are running your perl script under windows, and then seeing the ^M codes when you read it's output under unix, then as viveksnv suggested, you should use dos2unix to convert the output file If your script runs under unix, but is processing files generated under windows, the you can either use dos2unix to pre-convert all the input files before processing, or you can use a regular expression such as `$line =~ s/\s+$//` to strip all trailing white space from the end of each input line before further processing. This is more powerful than chomp as it will remove more than one newline character, though obviously you need to be careful with it if you might need trailing white space on lines to be preserved.	[reply] [d/l] [select]
Re: ^M chars in output file by Tux (Canon) on Nov 22, 2010 at 16:55 UTC
As a side note, both `\n` and `\r\n` are valid line endings for `CSV` files. And both will automatically be picked up by perl's `CSV` parsing/writing modules Text::CSV_XS and Text::CSV. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]