Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??

I wrote a program that is being run on different platforms (Linux, Windows, and Mac). I tried to write it to be as system-independent as I could (using File::Spec for paths, etc), but recently someone reported a bug. It turns out that she was creating one of the input files on a Mac, then transferring it to a Windows machine and running the program (I didn't think of that...). The error occurred when the program tried to read the input file line-by-line. I presume that since the program was run on a Windows machine, the input record separator ($/) was set to the Windows newline (\015\012). The input file was created on a Mac, though, so it had newlines of \015. As a result, the file got slurped and things turned ugly.

Now I'm trying to figure out how to handle this situation. I reread perlport, as well as 3 questions... 2 about newlines, and one on how to be NICE and Line Feeds, but they all seem to address writing specific newline characters to files, not reading them.

Here is what I came up with so far:

  1. Use $^O, but if I understand it correctly that will just tell me about the system the program is running on, which (as exemplified here) is not necessarily the same as the system that created the file.
  2. Use a regex to match the newline character(s) in the file. I think this would require slurping the whole file and then doing something like if( $file =~ m/\015$/ ) (which assumes the file will end with a newline) or if( $file =~ m/\015(?!\012)/ ) (which doesn't), setting $/ according to what matched, and re-reading the file line-by-line.
  3. Preprocess the input file to convert all newline characters to the current system's newline character. I experimented a little, and I think this will work:
    $file =~ s[(\015)?\012(?!\015)][\n]g; $file =~ s[(\012)?\015(?!\012)][\n]g;

    I think this is my favorite solution, but it seems like a lot of extra overhead for each input file since the conversion only needs to occur once (assuming the input file is not then moved to another OS).

Are there better ways of handling this?


In reply to Newlines: reading files that were created on other platforms by bobf

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.