in reply to hashes & threads

Using threads together with threads::shared (to share the hash) might help, or it might make things worse.

If your problem is that your script is running too slow because you process a lot of files, or big files, you can try to see if that time is spent mostly reading, or processing. Just remove the content of the while loop and write while(<$fh>) {} and see if it runs significantly faster. If it does not, the issue comes from reading the files, and chances are multiple threads won't be able to do anything about it, because if all those folders are on the same device all your threads will have to wait for that device to be available and they will just run in a sequence in the best case scenario (they may force the device to jump from one file to another in a worse case).

You may be able to win a little time by not reading line by line but going directly for the first occurence of "area=", by setting the input record separator:

{ # block to limit the effect of local local $/ = "\narea="; <$fh>; # read until the first "\narea="; } # here we are back to reading until the end of the line if (not eof $fh) # if the end of the file hasn't been reached while tr +ying to find "area=" { $hash{$dir}{area} = <$fh>; # read the rest of the line }

Edit: it seems there can only be one "area=" for each file, so I have removed the outer loop. You can stop reading the file as soon as you have found a result with last if you keep your current code. Also, my example would ignore an "area=" line if it is the first of the file.

Replies are listed 'Best First'.
Re^2: hashes & threads
by gravid (Novice) on Jul 28, 2016 at 14:50 UTC

    Hi,

    I did the check you said, and I see fast runtime when while {}. so threads will not help here,Unless I submit a grid job from each thread.

    I wonder if a job can be just a subroutine, or it has to be a program..

    Guy

      If by running the empty while loop (but still reading the files) your script runs significantly faster, threads can make a difference (because it's not just time spent waiting for the file to be read). If it does not run faster, parallelisation of any sort is not going to be any more efficient, because all your parallel threads or processes are going to wait their turn to access the device.

      Although if - unlike what I expected - processing the file takes more time than reading it, you should try to see why, because the regular expression should be quite efficient with the ^ making it fail after checking a few characters on an incorrect line. Maybe returning from the function as soon as you have found the "area=" line can be a significant change...

      As a general rule, you should try to find exactly which part of the program is slowing it down (benchmark it) and why before you try to come up with solutions. Overall optimization often is a bad idea, because most of the program runs so ridiculously fast compared to the slowest part that not keeping the simpler version is a waste of development time.