in reply to File::Copy and file manipulation performance

Instead of scheduling via cron at predefined intervals, take a different route. A perl script running in the background as service, polling the directory for accumulated files for instance every 15 seconds. If the number of files is lower than a given threshold, say 1000 files, and the time elapsed since last proccessing of files (moving around)is less than two minutes, do nothing. If the threshold is reached and time elapsed is less than 2 minutes, perform the operation regardless. Else, if the time threshold goes over two minutes, and the files are less than the threshold, perform the operation.

The operation most likelly will consist of spawning a child proccess to do the task. If the files aren't actually moved from the watched folder, an internal list kept by the parent proccess could be kept, passing down to its childs related information.

The system could also be deviced as fail safe. Having a different Perl proccess monitoring the working parent. If the parent fails, automatic restart were it left off and alerting of the admnistrative team.

  • Comment on Re: File::Copy and file manipulation performance

Replies are listed 'Best First'.
Re^2: File::Copy and file manipulation performance
by Fendaria (Beadle) on Dec 06, 2005 at 22:31 UTC

    I'm unfamiliar with running a perl script on windows as a service. Is there a link/page you can point me to that explains it?

    I'm also unsure about launching threads under perl on Windows but it is something I am considering tackling. My biggest hurdle is making sure the same work isn't done twice (two perl programs checking the same directory and trying to move the same files).

    Fendaria
      When saying "service" i meant with the broad sense of the word. As a background, persistant proccess, that optionally starts up when the machine boots. This can be achieved by utilizing the native Windows Services API either via GUI or with the instsrv.exe commandline utility. That step would most likelly involve feeding the above utility with the full path to perl along with its commandline, the script you want run as a service. If this doesn't work out, you can try the standard All Users>Startup folder that initializes everything upon boot or placiong this in a login script. Plenty of options aside the obvious manual launching.

      As for threads on Windows, things are pretty straing forward with recent versions of Perl. Just use Threads and then do something like $thread = threads->new($coderef,@data) after reading up the documentation.

      Of course, certain data can be shared amongst threads, either by passing back and forth data between them, or keeping data on the parent and handing them down to worker threads, maintaining an index of what has been taken care of, and what is available for the next thread in line to take care of. I am not sure, but this i believe is the "Work Crew" threads model of operation.

      The point in this approach is staying miles away from the wall, thus making impossible to hit. What you were proposing seems to me like speeding at 200MPH and pulling the brakes 50 meters from the wall. Excuse the somewhat not amusing analogy. :)

        I agree, the best solution is to not get into the situation in the first place. However, the script can't run 100% (bugs, remote server downtime, etc) and the files are always arriving (and the amount is expected to triple over the next few months). Plus the files tend to arrive in 'bursts' and not evenly spread out.

        It just took 5 hours to move the 40,000+ files off the box and I believe >75% of the time was disk IO (running slowly via OS and number of files) from copy, rename, and delete. Just looking at the problem at that level I can't help thinking there is a better way to deal with it than what I am currently doing.

        Fendaria