Karger78 has asked for the wisdom of the Perl Monks concerning the following question:

Bascally what I am doing is a file compairson and if the files are equal then remove it out of the array. however, it's not working. I am sure it's somthing really simple that I am doing wrong. If someone could shed some light, this would be appreciated.
foreach my $currentremotefilelist (@remoteFilelist) { foreach my $currentlocalfilelist (@return2) { #myfile.txt =~ myfile.txt_date my $currentremotefilelistsize = -s $currentremotefilel +ist; my $currentlocalfilelistsize = -s $logSite.$currentloc +alfilelist; $currentlocalfilelist = $logSite.$currentlocalfilelist +; if (($currentremotefilelist =~ /^$currentlocalfilelist +/i) && ($currentremotefilelistsize == $currentlocalfilelistsize)) { print "$currentremotefilelist and $currentlocalfil +elist also $currentlocalfilelistsize and $currentlocalfilelistsize Th +ey are Equal and will not be copied\n\n"; delete $return2[$arrayCounter]; $arrayCounter++; }else{print "$currentremotefilelist and $currentlocalf +ilelist also $currentlocalfilelistsize and $currentlocalfilelistsize +are not EQUAL!!\n";} } }

Replies are listed 'Best First'.
Re: Array question
by Corion (Patriarch) on Nov 20, 2009 at 15:51 UTC

    You're modifying the array you're iterating over. Don't do that.

    Personally, whenever doing a nested iteration over two arrays, I try to convert at least one the inner array into a hash and do a hash lookup into the inner array. This is usually faster and makes for simpler and shorter code too.

    Of course, the challenge is to find a good common key to use in the hash. In your case, it seems that you only want to act upon a file in @return2 if it has the same size as a file in @remoteFilelist, and if its name starts with the same letters as that file. Your comment indicates that all local files are named myfile.txt_date. So the key could be myfile.txt|localsize:

    # First, build the hash of names+size of all the local files my %local_file; for my $filename (@return2) { $filename =~ /^(.*)_(20\d{6})$/ or die "Couldn't find date part in filename '$filename'"; my $remote_name = $1; my $local_size = -s $filename; my $key = join "|", $remote_name, $local_size; $local_file{ $key } = $filename; }; for my $remote_file (@remoteFilelist) { my $remote_size = -s $remote_file; my $key = join "|", $remote_name, $remote_size; if (! exists $local_file{ $key }) { print "$remote_file does not exist locally, or exists locally +but with the wrong size.\n"; } else { print "$remote_file exists locally as $local_file{ $key }\n"; } };
      Thanks I will give that a try. You see the issue i was running into is that I have two array's with a list of just the file names. But they could be in a different order, or one array could have more then the other, so I had to find away that we take the first file name, compare against all the names in the 2nd array, then go back to the 2nd file in the 1st array, and compare against all the file names in the 2nd array. Does that make sense?

        I would try to get away from needing to rely on things like "order". Which is why I suggest you use a hash, and building a unique key to find the matches between the two lists.

        Again, finding the criteria that make a match between the two lists is something only you can know. But if you have found and formulated what makes two elements "identical" in your case, it makes finding the match much easier, because then you can use that "identity" as the key into a hash and check whether an element with that key exists or not.

Re: Array question
by AnomalousMonk (Archbishop) on Nov 20, 2009 at 17:53 UTC
    In addition to the problems addressed by others, I think there may be problems with the file name comparison regex in the OP. (Corion's reply Re: Array question also addresses this by using a different regex.)

    The regex  $string_A =~ /^$string_B/i is anchored only at the beginning of the string, so if  $string_A eq 'foobar' and  $string_B eq 'foo' the two strings will match. (The strings being compared in the OP seem to consist of a file name and another file name with date appended, so this behavior may be correct.) A complete (case insensitive) match could be achieved by matching with  /^$string_B$/i using the  $ anchor.

    Second, and perhaps more important, a string interpolation like  /^$string_B/ causes any regex metacharacters like  . ^ $ ( ) [ ] that are present in the string to be active in the compiled regex. (The most pesky metacharacter in this case is probably  . (dot): match any* character.) The usual practice would be to metaquote the intepolated string in some way, e.g.,  /^\Q$string_B\E/i or with the quotemeta function prior to interpolation.

    * Well, any except a newline unless the  //s 'dot matches all' regex modifier is used. See s in the Modifiers section of perlre.

Re: Array question
by gmargo (Hermit) on Nov 20, 2009 at 16:05 UTC

    While removing entries from an array like that is inadvisable, your code will not work as expected for two reasons:

    1. You only increment $arrayCounter after a delete, when it needs to be incremented on every loop.
    2. After a delete, on the next main loop, you attempt to reuse the deleted array element.

    The following should work as you expect:

    foreach my $currentremotefilelist (@remoteFilelist) { my $arrayCounter = -1; foreach my $currentlocalfilelist (@return2) { $arrayCounter++; # Increment on every loop iteration next if !defined $currentlocalfilelist; # Skip previousl +y deleted entries #myfile.txt =~ myfile.txt_date my $currentremotefilelistsize = -s $currentremotefilelist; my $currentlocalfilelistsize = -s $logSite.$currentlocalfileli +st; $currentlocalfilelist = $logSite.$currentlocalfilelist; if (($currentremotefilelist =~ /^$currentlocalfilelist/i) && ( +$currentremotefilelistsize == $currentlocalfilelistsize)) { print "$currentremotefilelist and $currentlocalfilelist al +so $currentlocalfilelistsize and $currentlocalfilelistsize They are E +qual and will not be copied\n\n"; delete $return2[$arrayCounter]; } else { print "$currentremotefilelist and $currentlocalfilelis +t also $currentlocalfilelistsize and $currentlocalfilelistsize are no +t EQUAL!!\n"; } } }

    Updated: moved counter to above the next

      As pointed out by Corion, undefining an array element and then continuing to iterate over it again and again is highly problematic, and probably a Bad Thing™.

      However, if this is really necessary, it is not necessary to maintain a separate array index, which can be highly confusing. The element in question is already an alias and can be undefined by an  undef $currentlocalfilelist; statement.

      Ok, maybe i should explaine a bit further as it appears that there are a few different way to accomplish this. I have two arrays. RemoteFiles and LocalFiles. So I would like to do a compare against them to see if the files are the same. If they are remove them from the Remotefiles array. Note the RemoteFiles have a time stamp added to the file name as I am checking to see if they have been copied over before. Does this help?
Re: Array question
by keszler (Priest) on Nov 20, 2009 at 15:44 UTC
    Does $currentremotefilelist begin with whatever is in $logSite?

    Is @remoteFilelist a list of filenames only, or is there path info as well? Same question for @return2.

    If both @remoteFilelist and @return2 are lists of files, why compare each filename from the first with every filename in the 2nd?