djbryson has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to chomp the new line chars out of a file. I finally realized why the chomp isn't working. It's not recognizing the new line char. When I look at this text file in crimson editor the newline char looks like a sqaure. In notepad i see nothing. Just hitting enter in crimson shows no character, so the sqaure character seems to be different than a regular carriage return. Ideas?

Replies are listed 'Best First'.
Re: chomp not working
by toolic (Bishop) on Dec 06, 2007 at 21:56 UTC
    The chomp documentation specifies that the function "removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR)", which defaults to the newline character, \n.

    I'm not sure what those characters are, but when I am trying to identify unusual characters, I use the ord function:

    #!/usr/bin/env perl use warnings; use strict; my $str = 'abcde'; for (split //, $str) { print "$_:", ord $_, ":\n"; }
    Once you identify the characters, it will be easier to figure out a way of eliminating them.
      Good idea. I shortened the file to 2 lines to shorten the output, here's the output:
      t:116 e:101 s:115 t:116 :13 :13 :10 t:116 e:101 s:115 t:116 :13 :13 :10
      So there's 3 chars at the end of each line? 13,13,10 if you're curious here's where i got the file: link It's xml. I use a "get" to throw the content into a var, then write the var to a text file.
Re: chomp not working
by johngg (Canon) on Dec 06, 2007 at 22:05 UTC
    If you are on a *nix-like environment you could look at the file using od to get an idea of what line terminator is actually being used. Failing that, write a quick script to read the first couple of hundred characters from your file into a buffer and then do something like

    print qq{@{ [ ord $_ ] } => $_\n} for split m{}, $buffer;

    This will give you the ordinal value for each character and the character itself, one per line. You should be able to spot the line terminator from that. Then you could set $/ (see perlvar) to what you have found so that chomp will work.

    I hope this is useful.

    Cheers,

    JohnGG

      . . . and if you're not on a *NIX-y environment you can always look for od from something like Cygwin, or use the Perl version from ppt.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        Good idea. I shortened the file to 2 lines to shorten the output, here's the output:
        t:116 e:101 s:115 t:116 :13 :13 :10 t:116 e:101 s:115 t:116 :13 :13 :10
        So there's 3 chars at the end of each line? 13,13,10 if you're curious here's where i got the file: link It's xml. I use a "get" to throw the content into a var, then write the var to a text file.
Re: chomp not working
by Joost (Canon) on Dec 06, 2007 at 21:51 UTC

      That's not going to help. $/ already is "\012" on Windows.

      When reading, "\015\012" is converted to "\012" in one of the PerlIO layers, before readline gets a hold of the data, and therefore before $/ is applied.

      When writing, "\012" is converted to "\015\012" in one of the PerlIO layers, after print sends off the data, and therefore after $\ is applied. That's why "\n" ("\012") is used instead of "\r\n" ("\015\012") when ending a line.

      Maybe binmode is on, in which case the "\015\012" wouldn't be changed to "\012".
      Maybe the file is in the old Mac format ("\015").
      Maybe the file is in some corrupted format ("\015\015\012", which would look like "\015\012" when $/ is applied).

Re: chomp not working
by HeatSeekerCannibal (Beadle) on Dec 07, 2007 at 17:59 UTC
    Hi

    If you're passing files back and forth between *nix and windows, it may be that the end-of-lines are messed up.

    Have you tried dos2unix?

    Best of luck!

    Heatseeker Cannibal
      I think I know why the chomp wasn't working. chomp removes the newline character from the end of the string. My file was all 1 string... so it's only removing the last newline char, not the new line chars throughout.
      solution: $content =~ s/\r\n//g;
      Thanks guys