stevieb has asked for the wisdom of the Perl Monks concerning the following question:

Hi again Monks,

So I finally got around to troubleshooting one of my module CPAN Testers failures on Windows platforms.

It became clear after a while that the purpose for the failures is that I have sample data files, along with base expected output files for certain tests in my suite. These are breaking because of line endings. I've worked around it in my Makefile.PL file with the following, and I'm just wondering if this is overkill and whether there's a more sane way to make my tests cross-platform. Note that I'm just sweeping the entire /t directory as a test. I'll make the work focus only on the necessary files if there isn't a better way.

if ($^O eq 'MSWin32'){ print "\nPreparing unit tests for MSWin32 platform...\n\n"; my $dir = getcwd(); $dir .= "/t"; my @files; find({wanted => sub { return if ! -f; my $file = $File::Find::name; push @files, $file; }, no_chdir => 1, }, $dir, ); for (@files){ tie my @file, 'Tie::File', $_ or die $!; for (@file){ s/\n/\r\n/g; } untie @file; } }

Cheers,

-stevieb

Replies are listed 'Best First'.
Re: CPAN module unit test issues: OS line endings
by toolic (Bishop) on Sep 17, 2015 at 20:18 UTC
    Are you sure this is just a test issue, as opposed to a cross-platform mode that you want to support? Maybe if you share which module you are talking about with us, and we can see the specific tests and failure reports, you can get more specific advice.

      Thanks toolic,

      After I read your comment, I did further testing. This is my Devel::Examine::Subs module, which reads in files and performs work on them.

      The test data definitely needs to be converted on Windows, but you made a good point... that might break other things. In Windows, there's not many times you find a Unix-EOL'd file, however I've noticed that all the Perl files in Strawberry Perl are definitely Unix.

      I'm going to rework the entire module, so it can identify at the file level what EOLs it has, then act appropriately.

        So what I ended up doing is setting up a _end_of_line() method, which is called by either one of the two methods that handle the work of managing file names. This method peeks into the file, and sets an environment variable to the file's EOL ('\n' or '\r\n').

        The methods deeper into the application have been modified to use that line ending in the environment variable when they open the respective file later on in the process.

        Works pretty slick.

        In Windows, there's not many times you find a Unix-EOL'd file, however I've noticed that all the Perl files in Strawberry Perl are definitely Unix.
        Are you sure? Although I am very rarely using Perl under Windows and may well be wrong on that, I do not think this is correct.

        Just a quick try under Windows:

        C:\Users\Laurent>perl -e "print qq{foo\nbar}" > foobar.txt C:\Users\Laurent>type foobar.txt foo bar
        Looking at the hexadecimal content of foobar.txt:
        66 6F 6F 0D 0A 62 61 72 ; foo..bar
        Although I used only "\n" in the script, Perl was clever enough to transform it into a windowish "\r\n" (0D 0A) end of line character combination.

        And, as far as I understand, Perl will also be clever enough, when reading a file and if detecting that it is working under Windows, to look for "\r\n" combinations as ends of lines, so that the whole thing is in fact transparent to the user: you are using "\n" for record separators and chomp, but Perl knows that it should be understood as "\r\n" if the OS is Windows.

        Trouble may occur when processing a Windows-generated file under Unix or vice-versa, but it you're using consistently the OS, Perl will essentially do what you need under the hood, without you even noticing it. So you may think that the Strawberry Perl generated files are Unix-like, but it is most probably not the case.

        I think that you need to understand that to figure out how (and whether) you need special processing for your module under Windows.

Re: CPAN module unit test issues: OS line endings (how?)
by tye (Sage) on Sep 18, 2015 at 01:19 UTC

    How did you write such fragile or broken code that it breaks in the face of different (but native) line endings despite Perl pretty much taking care of that for you automatically? :)

    I could see having a problem on an old Mac if you included test data in files having non-old-Mac line endings (but perhaps Perls on old Macs handle that case better than I would expect based on the line-ending logic that I am aware of as I have not looked at the old-Mac-specific parts of Perl's source code). And I could see having minor problems when including data files having Windows line endings and then using them on Unix (if you don't always strip trailing whitespace from text lines, as is wise).

    With your (subsequent) mention of Devel::Examine::Subs and finding test failures on Windows (for 10 versions back), the specific failure cases appeared to me to be types of failures that would be rather far downstream from the direct impact of different line endings, so I didn't continue spending time trying to figure out how the failures happen. Perhaps you could just describe that to us?

    - tye        

      The problem stemmed from the fact that Tie::File (which I use in this module) switches its record separator automatically when on either nix or Windows. So when my test data files were downloaded from CPAN on Windows, they still had Unix line endings and weren't being read in correctly.

      In the end, before I open a file with Tie::File, I check to see what type of line endings it has, then I set Tie::File's recsep parameter to either \n or \r\n depending on the situation.

      Note that I hardly ever use Windows, so didn't think to test it on that platform, and of course we haven't been getting Testers reports because of the issues.

      I'm going to look at Tie::File tomorrow to check whether I'm missing something, or whether automatic record separation can be incorporated within it (just \n, \r\n).

        Why are you using Tie::File? I can't recommend that module as it is a fine example of getting a slight bit of superficial simplicity at the expense of way too much hidden complexity. Such things too often end up biting you before you are done (as happened here).

        I only skimmed a bit of the code. The only actual using of a tied array that I noticed was:

        @{$subs{$file}{TIE_file}} = @TIE_file;

        Which seems to provide absolutely zero benefit from the use of that module. That does nothing more than what a simple open and then @{...} = <$fh>; would do (except it is less efficient and leverages way more hidden complexity which leads to fragile surprises like not dealing well with line endings).

        That code also slurps the entire file contents into memory. This limits the size of problems that can be effectively handled by your module.

        - tye        

      How did you write such fragile or broken code that it breaks in the face of different (but native) line endings despite Perl pretty much taking care of that for you automatically?

      It could well be that tar is the fly in the ointment.
      When downloading from CPAN one is generally grabbing tarballs, and the text files in those tarballs most commonly have nix line endings.
      Most Windows tar utilities can, I think, be configured to convert the line endings to Windows format but I've been bitten by configuring tar that way. (On rare occasions it would consider a binary file to be text.)

      As a consequence of that, I for one, always untar CPAN tarballs "as is" on Windows - which means that the unpacked text files most commonly contain nix endings.
      And I doubt that I'm the only person doing that.

      This makes it difficult for a CPAN author to know whether the unpacked distro on a Windows machine will have nix or windows line endings - and I think the best general advice is to construct things such that there's no need to know this.

      Cheers,
      Rob
        Most Windows tar utilities can, I think, be configured to convert the line endings to Windows format but I've been bitten by configuring tar that way.

        Getting either Unix or Windows line endings in text files when on Windows should not present a problem (to mundane Perl code).

        - tye