thesundayman has asked for the wisdom of the Perl Monks concerning the following question:

Well, this is a piece of code i use and it works well :-). But i can't understand how it works, can anyone help ...

When one store text in an MSSQL server text field it replaces the new lines with \r\r\n and it confuses perl when i take it out again and parse it. however the following code fixes it ... I takes the text field from sql server store it in a file, then i apply the following code on it, and the text file turns back to normal

open(IN, "$ARGV[0]") || die "unable to open $ARGV[0]"; open(OUT, ">$ARGV[1]") || die "unable to open $ARGV[0]"; binmode(OUT); #set output mode as binary while(<IN>) { if(/^\n$/) { print OUT "\r\n"; # if this the record separator, print a proper + line } else { print OUT; } # else just print line with a CR } close(IN); close(OUT);

but how does that work

Replies are listed 'Best First'.
Re: Can anyone figure how this works?
by chromatic (Archbishop) on Sep 29, 2001 at 22:40 UTC
    It opens up an input file (first command line argument) and an output file (second command line argument) for writing. Next, it loops through the lines of the input file. The pattern match just verifies that the line is only a newline character. If so, it doesn't print it to the output file, it prints the standard DOSish \r\n combination. Otherwise, it prints the line verbatim.

    Yeah, the explanation's longer than the code.

    Update: Let's further compact tachyon's replacement into a one-liner: perl -pi.bak -e "s/^\n$/\r\n/" <filename>

    ©

Re: Can anyone figure how this works?
by tachyon (Chancellor) on Sep 29, 2001 at 22:51 UTC

    Presumably you call this script like this:

    $ fix.pl wrong.data fixed.data

    Here is a blow by blow:

    # command line arguments are available to the script in the # @ARGV array. Thus the first argument is in $ARGV[0], the # second in $ARGV[1].... # Open the file specified in the first command line arg for reading open(IN, "$ARGV[0]") || die "unable to open $ARGV[0]"; # Open the file specified in the second command line arg for writing open(OUT, ">$ARGV[1]") || die "unable to open $ARGV[0]"; # stop perl making automatic \r\n => \n or \r => \n line ending # conversions which are required on Win32 and Mac respectively binmode(OUT); #set output mode as binary # now iterate over our input file on line at a time while(<IN>) { # if we have a line that contains only "\n" - ie a blank line if(/^\n$/) { # then we print "\r\n" instead of the existing "\n" into our o +utput file print OUT "\r\n"; } # otherwise just print out the totally unaltered line else { print OUT; } # else just print line with a CR } # close the input and output files close(IN); close(OUT);

    If you want a short way to do the same this will do it with an inplace edit. You call it like this fix.pl data The data in data will get munged and a backup will be made called data.bak The backup will contain the original data, the argument file the modified data.

    #!/usr/bin/perl -i.bak -w while (<>) { s/^\n$/\r\n/; print; }

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      My only question is with regard to inplace edit. How does it function with regard to Binmode? Or has it been left out as it appears unneeded?

      Slightly confused, :-)

      Yves
      --
      You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

        In unix the default line ending is a line feed LF (\n or \012 or 0xA). In DOSWin it is carriage return line feed CRLF (\r\n or \015\012 or 0xA0xD). On a Mac it is CR (\r or \015 0xD)

        When you read and write text files perl will automatically convert from its internal use of \012 for the line ending to whatever the system is using. On unix this means it does nothing, but on Mac and Windows conversions are made.

        Binmode tells perl not to convert line endings when reading or writing. In other words it does a raw read/write. On unix binmode has no effect as there is no conversion to make. On other systems the results are quite predictable.

        Perl uses \012 as the default line ending so if you binmode an output file handle such as STDOUT, $fh, etc and then print "blah \n"you will write \012. This will not be correctly recognised as a line ending on DOSWin or Mac when trying to read this file - this is true for any program including perl programs. Unix however, will read this file fine as will perl running under unix.

        When you binmode an input filehandle such as STDIN, $fh, etc you get a raw read. Thus on DOSWin if you read in a textfile under binmode you will see that the line ending is \r\n. Visually this appears a double spaced lines.

        How *your* script behaves is system dependent. When you read from a file you read one line at a time as defined by the default system line ending - internally this will be represented as \n Let's assume you have a file that has \r\n line endings. On unix you will get an internal file with \r\n because no conversion is done. Reading the same file under DOSWin will get an internal file with only \n line endings.

        With binmode on an output FH when you output \r\n that is what you get. With binmode off things will differ. On unix you will still get \r\n. On DOSWin you will get \r\r\n as the \n is converted to \r\n.

        Unless you are writing text files across systems you do not need to worry too much. If you need binmode on the inplace edit script just binmode STDOUT.

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print