roperl has asked for the wisdom of the Perl Monks concerning the following question:

I have an array of files. @tmparray I want to sort by modified time. So I'm using the code below
@tmparray = sort { -M "$b" <=> -M "$a" } (@tmparray);
It works fine majority of the time. However, if the file has been removed but the name still exists in @tmparray I get the error below
Use of uninitialized value in numeric comparison (<=>) at ...
How can I test if the file exist before getting modification time and remove the entry from @tmparray

Replies are listed 'Best First'.
Re: Sort by -M
by afoken (Chancellor) on Feb 02, 2018 at 18:49 UTC
    How can I test if the file exist before getting modification time and remove the entry from @tmparray

    You can't, reliably. You could use -e and grep, but there will still be a race condition between the -e check and the -M, when another process could remove one or more of the files in @tmparray.

    Also, running two -M in every sort step is quite expensive. Consider using a Schwartzian transform to read each file's mtime only once. Inside the transformation, you could filter out all files for which -M returned undef (grep, defined). But even then, files deleted while sorting would still appear in the output.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Sort by -M
by tybalt89 (Monsignor) on Feb 02, 2018 at 19:36 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1208341 use strict; use warnings; use Data::Dump 'pp'; my @tmparray = (<d.tree*>, 'nosuchfile'); # for testing purposes... pp \@tmparray; # replace # @tmparray = sort { -M "$b" <=> -M "$a" } (@tmparray); # with @tmparray = map $_->[0], sort { $b->[1] <=> $a->[1] } grep defined $_->[1], map [ $_, -M $_ ], @tmparray; # end replace pp \@tmparray;
      That looks like it'll work.
      Can you explain it. I'm not familiar with map
        map is a special "short-hand" for a "foreach loop".
        All map statements can be expressed as a "foreach loop".
        Consider:
        #!/usr/bin/perl use strict; use warnings; my @array = (1,2,3); @array = map {$_+1}@array; print "@array\n"; #prints: 2 3 4 foreach my $num (@array) { $num++; } print "@array\n"; #prints: 3 4 5
        Update: As an additional comment, sometimes I see a comment like "I didn't want to use a loop, so I used map". That is wrong. Map is a looping instruction whether it looks that way in the source code or not. A map vs a foreach loop winds up being basically the same in terms of how the input array is processed. I've seen 1/2 page map statements which in my opinion is an abuse of the feature. I use map for simple one line transformation operations. Mileage and situations vary.
Re: Sort by -M
by ikegami (Patriarch) on Feb 02, 2018 at 21:11 UTC

    You'll need to perform the stat calls before the sorting.

    @a = map $_->[0], sort { $b->[1] <=> $a->[1] } grep defined($_->[1]), map [ $_, -M $_ ], @a;

    You might even get a speed boost from the reduced number of stat calls!

    [Oops, this is basically identical to tybalt89's post, but it hadn't been posted yet when I started composing this post.]

Re: Sort by -M
by Marshall (Canon) on Feb 02, 2018 at 20:46 UTC
    Here was my first thought (which has problems): Since the sort routine can be arbitrarily complex, you could set the -M time to zero if the file doesn't exist. That way there would be no error in the sort and non-existant files would appear at the beginning of the sort order (could use a huge value and force them to the end of the sort order). One issue that I see is if one of the files goes away while you are sorting. This could cause an unstable sort situation since the "M time" for one of the files could change (go from some normal value to zero) during the sort and affect the comparison function.

    As afoken++ points out, a Schwartzian transform would be both more efficient CPU wise as well as prevent the above unstable comparison function while sorting problem. For the Schwartzian transform, you would calculate the M time values only once before the sort commences.

    No matter what you do, there is the possibility that in your sorted file name list, some file will not exist when it comes to process that list. I think the best you can do is be aware that some files in the sorted list may not exist after sorting. Adjust your sort comparison function so that no errors can occur during the actual sort and these non-existent files appear at one end or the other of the sort. Of course after you create the [filename, Mtime] array for the Schwartzian transform, you can grep out non-normal Mtime entries. However that will not prevent non-existent files from appearing in the final sorted result - you have to handle that no matter what.

    Update: I was going to post some code, but I saw the code by tybalt89++. His code looks fine to me and "jives" with my advice. Be aware that even after the sort, some non-existent files may appear in the sorted list.

Re: Sort by -M
by Laurent_R (Canon) on Feb 02, 2018 at 19:01 UTC
    It works fine majority of the time. However, if the file has been removed but the name still exists in @tmparray I get the error below
    Can you please explain why @tmparray has files which do not exist in your directory?

    Depending on the reason, you might be able to filter the items of @tmparray before the sort (for example with a grep -f or some other means).

      I have the code running as a daemon process on two servers on a shared NFS mount.
      So either process can find the files and put the list of files into its own @tmparray.
      Each process will then lock the files for further processing. So only one process is able to lock and process the files.
      When the file is removed by the other process between the finding and the sorting I get the error.
      The code continues to work, but it occasionally spits out these errors to my logs
        If you have two concurrent processes, then I'm afraid there isn't much you can do to prevent this from happening (at least within the context of what you explained). The best that you can do is possibly to reduce the probability of this happening by filtering the list with a grep.

        However, you might ask yourself the following questions: is it really necessary (or useful) to have two concurrent processes to run on the same set of files? If you really want two concurrent processes, can't you "specialize" them, i.e. tell them to work on different file sets (based for example on the file names, file owner or age, or some other property of the files)? I cannot help but think that there is likely something wrong in your process if these two concurrent processes work on the same files, process them and delete some of them.

        Another solution would be, when you make your list, to pick up in an AoA or an AoH not only the file names, but also their age. Then your sort could be made on the filename/age pairs you've collected, and you would no longer have a problem when sorting them. But, you'd be processing names of files no longer existing, it may or may not make sense depending on the bigger picture which we don't know.

        BTW, these are warnings, not errors. I hate to say that, but, if there is not consequence on your process, you might as well decide to ignore them or even to silence them (although I am very reluctant at this type of decision, that's not what I would do in such a case).

Re: Sort by -M
by Anonymous Monk on Feb 05, 2018 at 06:37 UTC

    The problem does not lie in sort nor -M but rather you've data (external files/directories) being modified after read (readdir?). More than one thread/process making modification to shared data.