in reply to Re^2: the sands of time(in search of an optimisation)
in thread the sands of time(in search of an optimisation)

So, it takes you 1 minute for every 1000 files you work with. Alternately, you can process 17 files/second. To me, this means you're hitting the fundamental limits of Perl. Perl is, frankly, a very slow language from a CPU perspective. That's not what it was optimized for. It's been optimized for developer speed.

So, I would put forward that you really have two options:

I would try the forking option first. Look at Parallel::ForkManager. There's a number of ways you can iterate into the children, depending on how your directories and files are laid out. But, that's what I'd do first.

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
  • Comment on Re^3: the sands of time(in search of an optimisation)

Replies are listed 'Best First'.
Re^4: the sands of time(in search of an optimisation)
by moritz (Cardinal) on Mar 04, 2008 at 12:56 UTC
    The first step should be to find out if CPU or I/O is the bottleneck.

    There's no point in optimizing CPU usage if the program is not blocking on CPU.

    You can just look at the CPU usage, and if it's constant 100% during the program run, you know that it's worth improving.

    If file I/O is the bottleneck you can try to experiment with different file systems, RAID, different hard discs etc.

      CPU usage goes to about 70-90%. the hardware used to test this is 2 x 900mhz , 1024ram.
Re^4: the sands of time(in search of an optimisation)
by spx2 (Deacon) on Mar 04, 2008 at 12:59 UTC
    What do you think about POE. Do you think it could be used also in this case for parallelization of the processing ?
      If Perl is your problem and why you're moving to forking, why on earth would you do said forking with a massively-large Perl framework vs. a lightweight wrapper around fork?

      Also, moritz has a good point - have you determined if you're CPU-bound or I/O-bound? Forking or rewriting in C isn't going to help if your disk is pegged.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?