locust has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks,

I wrote a script at work today where I read from a csv file, and an html file, did some data manipulation, and spit it back out to another csv file. All was fine except I had the darn ^M chars at the end of my line. I know they are carriage returns, but I don't know why they decided to show up. I initially thought it was the way I was reading the file, so I tried two methods.

First I tried slurping. But I got the ^Ms. Then I tried reading the file line by line and chomping each one. Still, control ^Ms. Then my coworker started asking me what the hang-up was and I had to make do with what I had. It was a frustrating day to say the least.

I was hoping someone may know what the dillio is on this one. I've been developing with Strawberry Perl on Windows 7 of late, but I don't think that's what it was. I think it may have had to do with not chopming off the new lines chars.

Ideas?

Replies are listed 'Best First'.
Re: ^M chars in output file
by roboticus (Chancellor) on Nov 20, 2010 at 04:45 UTC

    locust:

    Use binmode on your file handles so you'll only get the "\r" characters when you want to write them them. Also, for trimming the end of lines, I typically use s/\s+$//, because it'll remove *all* of the junk at the end that I don't care about.

    ...roboticus

      roboticus:

      This worked, however, it's important to point out that one must use binmode on both the input AND output file. This drove me nuts for awhile until I put the obvious together.

      After reading the doc on binmode I gathered that Perl will automatically put the appropriate new line char at the end of the line for os your using.

      Thanks

Re: ^M chars in output file
by johngg (Canon) on Nov 20, 2010 at 13:16 UTC

    You could set the input record separator (see $INPUT_RECORD_SEPARATOR or $/ in perlvar) to "\r\n", use chomp to remove the separator and then add the \n at the output stage. Alternatively, open the file in <:crlf mode which will remove the CR for you. In the code below I use split, ord and sprintf to show the individual characters as read and after chomping.

    use strict; use warnings; open my $outFH, q{>}, \ my $dataFile or die qq{open: > scalar ref.: $!\n}; print $outFH qq{1,2,3\r\n}, qq{4,5,6\r\n}, qq{7,8,9\r\n}; close $outFH or die qq{close: > scalar ref.: $!\n}; { print qq{\nSetting input record separator to CRLF\n}; local $/ = qq{\r\n}; open my $inFH, q{<}, \ $dataFile or die qq{open: < scalar ref.: $!\n}; while( <$inFH> ) { print qq{Line $.\n}; print qq{ Original: }, qq{@{ [ map sprintf( q{%#.2x}, ord ), split m{} ] }\n}; chomp; print qq{ Chomped: }, qq{@{ [ map sprintf( q{%#.2x}, ord ), split m{} ] }\n}; } close $inFH or die qq{close: < scalar ref.: $!\n}; } print qq{\nOpening file in "<:crlf" mode\n}; open my $inFH, q{<:crlf}, \ $dataFile or die qq{open: < scalar ref.: $!\n}; while( <$inFH> ) { print qq{Line $.\n}; print qq{ Original: }, qq{@{ [ map sprintf( q{%#.2x}, ord ), split m{} ] }\n}; chomp; print qq{ Chomped: }, qq{@{ [ map sprintf( q{%#.2x}, ord ), split m{} ] }\n}; } close $inFH or die qq{close: < scalar ref.: $!\n};

    The output.

    Setting input record separator to CRLF Line 1 Original: 0x31 0x2c 0x32 0x2c 0x33 0x0d 0x0a Chomped: 0x31 0x2c 0x32 0x2c 0x33 Line 2 Original: 0x34 0x2c 0x35 0x2c 0x36 0x0d 0x0a Chomped: 0x34 0x2c 0x35 0x2c 0x36 Line 3 Original: 0x37 0x2c 0x38 0x2c 0x39 0x0d 0x0a Chomped: 0x37 0x2c 0x38 0x2c 0x39 Opening file in "<:crlf" mode Line 1 Original: 0x31 0x2c 0x32 0x2c 0x33 0x0a Chomped: 0x31 0x2c 0x32 0x2c 0x33 Line 2 Original: 0x34 0x2c 0x35 0x2c 0x36 0x0a Chomped: 0x34 0x2c 0x35 0x2c 0x36 Line 3 Original: 0x37 0x2c 0x38 0x2c 0x39 0x0a Chomped: 0x37 0x2c 0x38 0x2c 0x39

    I hope this is helpful.

    Cheers,

    JohnGG

Re: ^M chars in output file
by planetscape (Chancellor) on Nov 20, 2010 at 14:03 UTC
Re: ^M chars in output file
by viveksnv (Sexton) on Nov 20, 2010 at 05:11 UTC
    Hi
    My experience is,

    ^M lines at end of ines may be that file was created as a DOS format file and you are seeing in Unix environment.

    Did you try dos2unix command in linux
      Or just remove all ^M with a one-liner
      perl -pi -e 'tr[\r][]d' file

      print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
Re: ^M chars in output file
by chrestomanci (Priest) on Nov 20, 2010 at 21:12 UTC

    The line endings of files generated by perl will match the standard for the operating system that you run perl under. So if you run your perl script under windows, and then examine the output with unix/linux, you will see extra ^M Line endings.

    Try a simple test program

    #! perl open OUTFILE, '>', 'test_out.txt' or die "Error writing to test_out.tx +t $!"; print OUTFILE "Hello\n"; print OUTFILE "world\n"; close OUTFILE;

    You should find with the script above that if you run it under windows, and then open it under unix, you should see those same ^M line endings.

    To see what is going on, open the output in a hex editor. The file generated under windows should look something like this:

    00000000 48 65 6C 6C 6F 0D 0A 77 6F - 72 6C 64 0D 0A Hello..world..

    While the file generated under unix will look like:

    00000000 48 65 6C 6C 6F 0A 77 6F - 72 6C 64 0A Hello.world.

    Note that the file from windows has two return chars (0D 0A), where as the unix one has just one (0A). This illustrates how the different platforms have different new line codes. If you open a file generated on one platform using a dumb editor on a different one, then you will see artefacts from the difference in return codes, for example if you open the unix output file using windows notepad, you won't see a newline between hello and world. (Smarter editors usually detect the difference in line ending and automatically do the right thing for you.)

    In other words, the ^M codes you are seeing are nothing to do with your program, or how you it is loading data, but come from the platform you are running your program under, and the platform which generated it's input data.

    If you are running your perl script under windows, and then seeing the ^M codes when you read it's output under unix, then as viveksnv suggested, you should use dos2unix to convert the output file

    If your script runs under unix, but is processing files generated under windows, the you can either use dos2unix to pre-convert all the input files before processing, or you can use a regular expression such as $line =~ s/\s+$// to strip all trailing white space from the end of each input line before further processing. This is more powerful than chomp as it will remove more than one newline character, though obviously you need to be careful with it if you might need trailing white space on lines to be preserved.

Re: ^M chars in output file
by Tux (Canon) on Nov 22, 2010 at 16:55 UTC

    As a side note, both \n and \r\n are valid line endings for CSV files. And both will automatically be picked up by perl's CSV parsing/writing modules Text::CSV_XS and Text::CSV.


    Enjoy, Have FUN! H.Merijn