maybeD has asked for the wisdom of the Perl Monks concerning the following question:

I have a script which was written, and works as expected, in ActivePerl. It has been tested on a UNIX machine and works fine on there also. However, it does not work on another user's computer who uses Cygwin rather than ActivePerl. The part of the script that does not work as expected is as follows:
sub match_randomized_lists { $count = 0; open (RANDOMIZED_LIST, $randomized_list) || die print "Script cann +ot open $randomized_list"; my @randomized_list = <RANDOMIZED_LIST>; print $randomized_list; print "\n"; print @randomized_list; print "\n"; foreach my $list_to_match (@array_of_lists_to_match) { $count = $count + 1; print OUTPUT "NUMBER "; print OUTPUT $count; $match_count = 0; foreach my $randomized_entry (@randomized_list) { chomp $randomized_entry; if ($list_to_match =~ /$randomized_entry/) { $match_count = $match_count + 1; } } print OUTPUT ": "; print OUTPUT $match_count; print OUTPUT "\n\n"; } }
The code is intended to match each element of the array of randomized entries with the lists of entries in the "array of lists to match". If there is a match, the match counter increases by 1.

This way for each randomized list, I get a total number of matches with each list of entries. This is printed to an output file and further calculations are carried out on these at a later stage.

In ActiveState and UNIX, the subroutine works as expected, finding matches. In Cygwin, it never finds any matches.
Before I resort to installing Cygwin on my own computer, is/are there any known issues with anything in my code and Cygwin?

Replies are listed 'Best First'.
Re: Doesn't work in Cygwin
by BerntB (Deacon) on Nov 16, 2005 at 11:58 UTC
    First thing I'd check if it worked with ActiveState but not Cygwin, was the 'chomp'.

    Check so it isn't confused about 1 or 2 chars end-of-lines. (See what you have after the chomp, either one char to little or an extra \n.)

    (It is a potential mess. You can configure Cygwin for how it should treat eol:s. Also, if you wrote the data file in Unix and copied over the file in bin format, you'd get a single \n.)

    Update:
    Should add some code that do reading w/out extra chars in Cygwin, as bart did. This works for me in fixing up mixed files (I use an old version of Ultraedit that mixes line endings! Don't use Win much).

    while(<$fh>) { chop; chop if /\015$/; print UT $_; print UT "\012"; } close $fh;
      To the OP: I agree with berntB, the opinionated way Cygwin deals with line endings could really mess things up. If chomp doesn't work, you can always call
      tr/\r\n//d
      to completely get rid of LF and CR characters.

      Furthermore, you check for duplicates with

      if ($list_to_match =~ /$randomized_entry/)
      which is an extremely poor ways to handle this kind of things. It won't match if unexpectedly $randomized_entry contains meta characters, it's prone to crashes if it doesn't actually look like a valid regexp, and it'll give you false positives when for example $randomized_entry contains "foo", and $list_to_match contains "if music be the food of love".
        Do you have a suggestion how you would handle the matching procedure here? The randomized lists consist of a particular type of string (a reference ID) and these same kinds of strings are in the list to match array elements.
        The latter case you describe cannot happen with these, but I guess some meta-characters might conceivably be able to feature in the reference IDs.

        Would you suggest using \Q?

Re: Doesn't work in Cygwin
by Happy-the-monk (Canon) on Nov 16, 2005 at 11:42 UTC

    Update: OP has updated the code after reading this.

    With only a short glance, it seems to me that you expect the variables   $list   and   $list_to_match   to be the same thing, but they aren't.

    But that's got nothing to do with ActiveState or Cygwin.

    Cheers, Sören

      Oops, that was an error in copying the code to PerlMonks (as I changed the variable names when doing so). They are both the same.