avanta has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I am working on a script where u read multiple files from a folder after applying filter on the name of the file. And I write this in a single file.

Now I need to speed things up i wish to split the whole process in no of instances. I wrote the following code, but am not able to get the expected result.
use strict; use warnings; opendir DATDIR, "dats/"; my @files = grep { $_ ne '.' && $_ ne '..' } readdir (DATDIR); my $dat = "dats/"; #my @files = <$dat/*.*>; my @datfiles = sort {$a cmp $b} @files; my @print = ""; my $toNumofFiles = @datfiles; my $instance = 5; my $report = "2010_01"; my $filesperinstance; my $remainingfiles; my $print; if($toNumofFiles < $instance) { $filesperinstance =1; $instance = $toNumofFiles; } else { $filesperinstance= $toNumofFiles / $instance; $remainingfiles = $toNumofFiles % $instance; } my @childs = (); my $startArrayIndex = 0; my $endArrayIndex=$filesperinstance; for(1..$instance) { if ($remainingfiles !=0) { $endArrayIndex = $endArrayIndex + 1; $remainingfiles = $remainingfiles -1; } my $pid = fork(); if($pid) { push(@childs,$pid); } elsif($pid==0) { my $locStartArrayIndex = $startArrayIndex; my $locEndArrayIndex = $endArrayIndex; for (my $fileIndex = $locStartArrayIndex; $fileIndex < $locEnd +ArrayIndex; $fileIndex++) { if($datfiles[$fileIndex] =~ m/^$report/) { print $datfiles[$fileIndex],"\n\n"; my $datDir = "dats/$datfiles[$fileIndex]"; print $datDir,"\nDat Dir\n"; read FILE, "< $datDir"; @print = <FILE>; } } exit(0); } else { exit; } $startArrayIndex = $endArrayIndex; $endArrayIndex = $endArrayIndex + $filesperinstance; print $_,"\ninstance \n"; } foreach(@childs) { waitpid($_,0); } print @print,"\nPrint\n";


I am getting this error
Can't modify string in read at try.pl line 62, near ""< $datDir";" Not enough arguments for read at try.pl line 62, near ""< $datDir";" Execution of try.pl aborted due to compilation errors.
Kindly tell me where I going wrong?

Thanks
AvantA

Replies are listed 'Best First'.
Re: how do I create parrallel processing in this script?
by moritz (Cardinal) on Feb 05, 2010 at 09:24 UTC
    Kindly tell me where I going wrong?

    It seems you confused read with open. And please remember also to close the files you open.

    Perl 6 - links to (nearly) everything that is Perl 6.
      damn!
      wat a mistake..
      thanks a ton buddy.. by the way I am not getting the output values...is there anything else which I may be missing??
        is there anything else which I may be missing??

        Yeah, see the second paragraph in my reply. Forking does not work that way.


        All dogma is stupid.
Re: how do I create parrallel processing in this script?
by tirwhan (Abbot) on Feb 05, 2010 at 09:40 UTC

    Additionally to moritz's correct answer, I'll caution that you are unlikely to experience a huge difference in processing time by parallelizing the input operation. IO is limited by the disk speed, and turning a single process reading in one file at a time into several processes that read these files in parallel will likely affect performance detrimentally, because the disk head has to constantly jump to a new position while reading in. So, unless you have a very unusual drive configuration (or weird limits set on your single process by the OS), this will probably slow down the entire process, rather than speeding it up.

    Also, reading in the file into an array in a forked child and then printing that array in the parent won't work, data structures don't remain shared in a forked process, you need some form of IPC for that.


    All dogma is stupid.
      Thanks a lot for the suggestion. I think I would be needing another help, in this code i wanted to read files and split into threads the process and then join the output of the threads into one file. How an I do it other than using fork(),

        Hmm. As I said above, there's not much point in doing this, because you almost certainly won't improve performance. However, if you did want to do this you could use one of the following methods:

        - print to the output file from the children instead of the parent. You'll need to lock the output file in some way (see "perldoc -q lock" for some ways on doing that) so that the writes are not completely jumbled.

        - use some form of Inter Process Communication to read the file content into the parent process. Read the documentation I linked to or search cpan for IPC for various ways of doing this.

        - use threads and a shared variable instead of fork.


        All dogma is stupid.
Re: how do I create parrallel processing in this script?
by cdarke (Prior) on Feb 05, 2010 at 14:21 UTC
    In addition to the excellent answers above, your practice of reading the whole file into an array could be causing a performance issue. Your process might be paging like mad.

    Just processing one record at a time, if that is possible, could speed it up, or maybe using tie?