colox has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I have this puzzling issue with my program & I really hope you could guide me as usual. Below code is part of my program where it checks for files. It works perfectly fine if the directory being checked ($indir) is a local folder. But it seems to "freeze" (does not throw any error) when I changed the path to either a mapped drive )e.g. Z:) or absolute path (e.g. //my_path/folders). Your inputs will be greatly appreciated.

sub Get_FileList{ @files=(); @todel=(); find(sub {push @files,$File::Find::name if (-f $File::Find::name a +nd /\.*$/ and stat($File::Find::name)->mtime > $lastepoch);}, $indir) +; find(sub {push @todel,$File::Find::name if (-d $File::Find::name); + }, $indir); $lastepoch = time; #Update last execution time for the next run $logger->info("New execution time update: $lastepoch."); $proccount = scalar(@files); $logger->info("Found $proccount new files since last run."); }

Replies are listed 'Best First'.
Re: Execution hangs on File::Find
by Eily (Monsignor) on Dec 04, 2017 at 11:34 UTC

    It sounds like execution just takes a while, probably because there are a lot of files on your mapped drive, and access is slower than a local drive (I'm guessing this is a network directory?). Maybe you can log progress (eg: report on current position every 100 file?) to show that something is still happening

    One obvious optimization is to only traverse the tree once. And since this would make the "wanted" function a little more complex, it's easier to read if you don't define it directly in the call to find:

    { my $lastepch = time; my @files; my @todel; my $counter; sub wanted { $logger->info("Reached $File::Find::name") if ($counter++ % 100) = += 0; push @files,$File::Find::name if (/\.*$/ and -f $File::Find::name +and stat($File::Find::name)->mtime > $lastepoch); push @todel,$File::Find::name if (-d $File::Find::name); }, $indir +; } sub Get_FileList { @files = (); @todel = (); find (\&wanted, $_[0]); $lastepch = time; ... return (\@files, \@todel); } } my ($files, $todel) = Get_FileList($indir);
    I've also put the test on the file name first, it's a guess that it might be a little faster, because the name is already in memory and doesn't require drive/network access. And I'm not sure but it kind of looks like you have global variables, so I've shown you how you can have variables shared between functions without making them available everywhere.

      thank you for the inputs. I did make the optimization you indicated in traversing only once - good find. i also added the logging within the sub routine but still indicates it is stuck on the top level directory & never traverse it. I have the same number of files locally & in the network share so im certain it is not the number of files. i was thinking i need to add network authentication on my code but then i thought mapped drive should be like local drive as i've already entered the network credentials to create the drive mapping.

        Well, add more logs to find precisely where you are stuck:

        sub wanted { $logger->debug("Reached $File::Find::name"); $logger->info(Carp::longmess("TRACE2")); if (-f $File::Find::name) { $logger->debug("This is a file: checking time"); stat($File::Find::name)->mtime > $lastepoch or return; push @files,$File::Find::name; } else { $logger->debug("Not a file"); if (-d $File::Find::name) { $logger->debug("This is a directory"); push @todel,$File::Find::name; } else { $logger->debug("Not a directory"); } } $logger->debug("Exiting"); }
        BTW, /\.*$/ means that the name must have 0 or more dots at the end. This is true for all strings so this test is useless.

Re: Execution hangs on File::Find
by jahero (Pilgrim) on Dec 04, 2017 at 11:51 UTC
    I have encountered similar behaviour when iterating over rather large directory tree over network, and I believe this *might* help a bit... Although it is hard to say, if your problem is the same, as mine was.
    sub Get_FileList{ @files=(); @todel=(); ####### WORKAROUND START my $sloppy = ${^WIN32_SLOPPY_STAT}; ${^WIN32_SLOPPY_STAT} = 1; ####### WORKAROUND END find(sub {push @files,$File::Find::name if (-f $File::Find::name a +nd /\.*$/ and stat($File::Find::name)->mtime > $lastepoch);}, $indir) +; find(sub {push @todel,$File::Find::name if (-d $File::Find::name); + }, $indir); $lastepoch = time; #Update last execution time for the next run $logger->info("New execution time update: $lastepoch."); $proccount = scalar(@files); $logger->info("Found $proccount new files since last run."); ####### REVERT WORKAROUND ${^WIN32_SLOPPY_STAT} = $sloppy; ####### REVERT WORKAROUND END }
    See ${^WIN32_SLOPPY_STAT}
      ++ed the workaround, but I'm wondering whether it's possible to use local ${^WIN32_SLOPPY_STAT} = 1; instead of saving/restoring the value manually?
        It should be possible (might test it in my code later on).

      thanks for the share. i quickly tried, however, it didn't make any diff; still stuck on the top directory without logging or throwing anything =(

        Then I guess next step should be testing yout code over smaller directory tree, see if it hangs, and/or where it did spend most of the time.

        I find it hard to believe that File::Find would freeze, although can hardly backup this claim with concrete evidence, so I would probably profile the run (Devel::NYTProf) on something small first.

        Good luck, perhaps someone with more knowledge then humble me shall help.

        What does "dir maped" return from cmd Exe?
Re: Execution hangs on File::Find
by Dallaylaen (Chaplain) on Dec 04, 2017 at 11:26 UTC

    Just a couple thoughts.

    I would start with a debugging stacktrace:

    use Carp; $SIG{ALRM} = sub { Carp::cluck("HERE") }; alarm 10; # ... now proceed with files

    This would print a stacktrace after 10 seconds of execution, providing pointers about where the code is.

    Alternatively you can bind to $SIG{INT} and press Ctrl-C to see where you are. Or you can use $logger->info(Carp::longmess("HERE")); instead or cluck.

    Also I'm wondering whether the script has a use strict; in it. Because if it doesn't, debugging becomes twice as hard.

      invoking it within the sub function & log.

      sub Get_FileList{ @files=(); @todel=(); $logger->info(Carp::longmess("TRACE1")); find (sub { $logger->debug("Reached $File::Find::name"); $logger->info(Carp::longmess("TRACE2")); push @files,$File::Find::name if (-f $File::Find::name and + /\.*$/ and stat($File::Find::name)->mtime > $lastepoch); push @todel,$File::Find::name if (-d $File::Find::name);}, + $indir); $logger->info(Carp::longmess("TRACE3")); $lastepoch = time; #Update last execution time for the next run $logger->info("New execution time update: $lastepoch."); $proccount = scalar(@files); $logger->info("Found $proccount new files since last run."); }

      logging below. i guess it is really getting stuck when it tries to traverse the mapped drive

      04-12-2017 04:06:56:570 INFO TRACE1 at perl.pl line 85. 04-12-2017 04:06:56:577 DEBUG Reached Z:/ 04-12-2017 04:06:56:577 INFO TRACE2 at C:/Strawberry/perl/lib/File/F +ind.pm line 358. File::Find::_find_dir(HASH(0x5174bd8), "Z:/", 1) called at C:/Stra +wberry/perl/lib/File/Find.pm line 236 File::Find::_find_opt(HASH(0x5174bd8), "Z:/") called at C:/Strawbe +rry/perl/lib/File/Find.pm line 760 File::Find::find(CODE(0x4e8b7b8), "Z:/") called at perl.pl line 11 +8 main::Get_FileList() called at perl.pl line 85

      thank you. let me try this. this is something new to me.

Re: Execution hangs on File::Find
by colox (Sexton) on Dec 04, 2017 at 15:34 UTC

    dear monks, as a follow-up to this, it indicates that the overhead is caused by file::find recursive search. im wondering if there is also recursive readdir. if so, can help to share the syntax? i would like to do comparative study in the performance.

A reply falls below the community's threshold of quality. You may see it by logging in.