Noame has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I’ve 2 perl programs: ‘p1’ and ‘p2’.
‘p1’ is parent program and ‘p2’ is child.
Program ‘p2’ is generating two files report: txt and excel.
To generate the txt report it is took 3 minutes and the excel file is 10 min.
‘p1’ is calling to ‘p2’ (fork) and waiting to the text report – once the text report is ready ‘p1’ should to continue parallel to ‘p2’.
Finally, ‘p1’ should to wait to ‘p2’ (check the status) and print message succeed.

Please advice how to wrote it using fork,wait and all the relevant checking around.

Thanks
  • Comment on Split process->check ->and run Parallel

Replies are listed 'Best First'.
Re: Split process->check ->and run Parallel
by Corion (Patriarch) on Dec 23, 2008 at 10:20 UTC

    I'm not really clear on what you want to do. The basic usage of fork is as follows:

    if (my $child = fork()) { # in parent while (! -f "output.txt") { print "Waiting for output.txt\n"; sleep 10; }; print "Found output.txt, continuing\n"; } else { # In child, just run p2 exec 'p2'; # exit }; # do more stuff print "Waiting for child to finish\n"; wait $child;
      checking for the existence of file 'output.txt' independently by the parent process may not be the right idea and there is no guarantee that it is generated by the child process.

      There is a possibility that once p1 forks p2 and execs that; p2 can be signaled to terminate and any other process p3 can create a dummy file 'output.txt'; in this case, p1 will assume that everything ran well and output.txt is available.

      Also, there is one more problem with this approach - lets say output.txt is properly generated by child process p2 and 'if its in the process of being generated' ( file size being 1 GB ) - the moment parent process 'p1' sees the file it will assume its a successful completion and continue with its work and there is no guarantee that p2 will run to completion in creating the full file of size 1 GB ( this is just an example )

      so, its better that the child process signal the parent process after completion of its work or chunk of its work as per the requirements.
Re: Split process->check ->and run Parallel
by roboticus (Chancellor) on Dec 23, 2008 at 14:54 UTC
    Noame:

    If I'm interpreting your question correctly, then you're wanting P1 and P2 to work something like this:

    P1 [A------B-------C--------D-------E] P2 [1-----2--------3] A: P1 starts working B: User wants a report, so P1 spawns job P2. P1 now waits..... C: P1 detects that the .TXT file is ready, so it resumes working D: P1 detects that the .XLS file is ready, so it reports "succeeded" E: P1 ends 1: P2 starts and begins work on the .TXT report 2: P2 finishes the .TXT report and begins work on the .XLS report 3: P2 completes the .XLS report and ends.

    So your first task is to figure out how to communicate between your processes, and what messages you need to communicate. For your task, you could get by with three messages, all from P2 to P1. The first message would be something like "The text file is ready", the second message would be "the excel file is ready", and the last message would be "Error!"

    Now how to perform the communications? While there are packages available for interprocess communications, I wouldn't use them for this job. Instead, I'd use indirect communications by having P2 create explicit clues. For example, when the text file is complete, the file "REPORT.TXT" would appear in the output directory. When the excel file is complete, then "REPORT.XLS" would appear in the output directory. If P2 wants to report an error, then "REPORT.FAILED" would appear in the output directory.

    Hopefully, you already know how to perform task P2, so we'll skip that.

    Next, you need to figure out how to receive the messages from P1. Luckily, perl provides -e to detect the existence of a file, so you can use it to look for your three messages. So P1 will spawn P2 and wait for REPORT.TXT or REPORT.FAILED to appear. If REPORT.TXT appears, then it can continue some operation until REPORT.XLS or REPORT.FAILED appears.

    Finally, you need to figure out what error handling you need. So, what errors would you expect to see? P2 could hang forever, never generating any messages. P2 could fail after generating the text file never creating REPORT.XLS or REPORT.FAILED. So think of all the basic scenarios. For each of the error cases you can think of, figure out how to detect it, and how to correct it. Note: You need to think of all sorts of oddball cases, such as: What happens if P1 crashes, and you restart it when P2 is running? How are you going to handle it when the new P1 starts a P2 job and you now have two P2 jobs running? There can be many potential error cases, so be sure to stretch your imagination here. (As you're laying out your code blocks, think of what assumptions you're making, and what could fail for the operations you're performing. That will help you find many of your error cases.)

    Now that you know the messages, interactions, and what sorts of error handling is required, lay out your code blocks, write 'em and test 'em.

    ...roboticus
      I think this is something similar to what I wrote in reply to Corion's post

      For example, when the text file is complete, the file "REPORT.TXT" would appear in the output directory. When the excel file is complete, then "REPORT.XLS" would appear in the output directory. If P2 wants to report an error, then "REPORT.FAILED" would appear in the output directory. Here are the 2 reasons:

      1) This approach may not suffice when there is a possibility of other files or users able to create/write files to a specific directory under the same name that process 'p1' is expecting

      2) If the file is being written to after it is created, then process 'p1' should ensure that file to be used ( output.txt ) is indeed complete ( that the file handles over the file no more is valid )

        It would enabled us to help you better if you had mentioned all these prerequirements in your original post instead of after you get solutions. Of course, if other users can create a file with the same name as the "P2" program, then you have other problems on your system, because then nobody will know whose report got generated and whose report got overwritten.

        matrixmadhan:

        You're absolutely correct. But I wasn't trying to give him/her a complete solution--I wanted to give some hints. That's why I mentioned that the OP should look for various error conditions and solve them. After all, we've gotta leave some work for the OP!

        ...roboticus
Re: Split process->check ->and run Parallel
by kyle (Abbot) on Dec 23, 2008 at 17:06 UTC

    Have p1 fork and exec p2. Have p2 write the txt report and then send a signal (with kill and getppid) to p1. After the fork, p1 can just sleep until it gets that signal. It would probably be a good idea to have p1 check that p2 is still alive (with kill or waitpid) once in a while. Once p1 gets its signal, it can do what it wants with the txt report while p2 works on the Excel report. When p1 is ready to wait again, just waitpid for p2 and check its exit status to make sure there was no error.

      It would probably be a good idea to have p1 check that p2 is still alive (with kill or waitpid) once in a while

      But, why is this needed, if a child process is terminated/killed a SIGCHLD will be delivered to the parent by default, so I don't think an explicit check for the existence of the child process is needed
        matrixmadhan:

        It's me again!

        The kill function doesn't have to terminate another process. There are other signals that it can send. You can send several different signals to another process with it, such as:

        $ kill -l 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP 6) SIGABRT 7) SIGEMT 8) SIGFPE 9) SIGKILL 10) SIGBUS 11) SIGSEGV 12) SIGSYS 13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGURG 17) SIGSTOP 18) SIGTSTP 19) SIGCONT 20) SIGCHLD 21) SIGTTIN 22) SIGTTOU 23) SIGIO 24) SIGXCPU 25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGLOST 30) SIGUSR1 31) SIGUSR2 32) SIGRTMAX

        If both processes agree on a signal for communication, then you can put a signal handling function in your processes to receive the messages. For example, you could use SIGUSR1 to let P2 tell P1 that the text report is done. Then you could send SIGUSR2 to indicate that the spreadsheet is done.

        ...roboticus

        You're right. The parent can just trap SIGCHLD and whatever "I'm done" signal the child is sending and sleep.