Pipes? Child Processes?

arrow has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
The Mail Clerk and the Magic Ring by chromatic (Archbishop) on Dec 05, 2002 at 03:19 UTC
Once upon a time, there was a little clerk who worked in a little post office. There wasn't much mail to sort, so he could take his time, doing one job at a time, until everything was finished. Over time, more and more mail came in. Some of it had to be processed very quickly, and some of it took a long time to sort properly. He was only one clerk, though, so he could only sort one piece of mail at a time -- no matter how long it took. This didn't make his customers very happy. Before long, he was awash in complaints, which just gave him more to do. The clerk decided to try an experiment. He would work for only a few moments on each piece of mail, then would switch to the next piece. He hoped this would prevent any one piece of mail from waiting too long to be sorted. Unfortunately, he could only do a certain amount of sorting in a day, and he was spending a lot of time switching between pieces of mail -- even touching some pieces of mail dozens of times before they were sorted. Luckily, he had a magic ring. The ring had a very special power -- it could make a magic clone of the clerk in his office. There was one drawback, though. The clerk and the clone could not share their thoughts. The clerk had a great memory. He could remember every piece of mail he had sorted in a day. So could the clone. Neither could remember what the other had done, though. The clerk tried the ring. It worked like, well, magic. He had a clone that would also process mail. A few minutes later, the clerk realized that both he and his clone were working on the same pieces of mail. He decided that he would be better off chosing a piece of mail, creating a clone, and then switching to a new piece of mail. That worked so well, he stopped sorting mail altogether. Instead, he created clones. Each clone would process one piece of mail and then disappear. He would keep track of which piece of mail he handed to the clone, and would know which mail had been processed. There was one problem remaining. What would happen if there were a problem with the piece of mail? For example, some pieces of mail were missing names. Others had bad addresses. The clones would just stop before they disappeared, and the mail would not be sorted. One option would be for the clone to put the piece of mail back in the pile if there were a problem. This was not good, though, because it would never be fixed. The clerk also did not have time to check each piece of mail himself to see if it had been sorted, as he was too busy using the magic ring to make clones and to make the clones disappear. The clerk finally decided that he needed some way to talk to his clones, so he used the ring to give each clone a magic walkie-talkie. If a clone discovered that there was a problem with its piece of mail, it would report the problem back to the clerk, and he could make a note of it in his memory. This was widely regarded as a good solution, and everyone was happy, except for the clerk's wife who had to make extra potatoes on the day when the clerk forgot to tell the clones to disappear after they had sorted their mail -- but that is a story for a different time.	[reply]
Re: Pipes? Child Processes? by belg4mit (Prior) on Dec 05, 2002 at 03:42 UTC
These things are very applicable to "the Internet and CGI". On systems that support them (for forking, usually UN*X or a derivative is required for good behavior), they are incredibly powerful. My hunch is you may not understand what they actually are, as I suspect comprehension of that should answer your question. perlipc has a good and thorough explanation, but it's mostly a how. A good perl book would go a long ways in describing what, but I can try. Pipes are one manner of letting processes communicate. Your CGI programs are probably communicating with the web server over a pipe, but you don't have to know that. That's one of the beautiful things about pipes. The web server is talking with your web browser over a socket, but both of them have to be aware of this and share an agreed-upon language and etiquette. As for children, forking simply creates another process (running instance of the program) that can do work. There's also some extra stuff like the fact that it's quicker than actually running the same program again, efficient(YMMV), and requires a little maintenance. Another way to think of it is in terms of farming. A farm is a lot of work, if you have lots of children you can pawn the work off on them. `-- I'm not belgian but I play one on TV.`	[reply]
Re: Pipes? Child Processes? by dbp (Pilgrim) on Dec 05, 2002 at 04:46 UTC
I'm not going to say much about children because the previous posts provide quite alot of info. To understand pipes you need to know a bit about the unix programming philosopy. A typical unix distribution comes with a lot of small programs that do one thing really well. For example, cat streams the contents of a file, grep matches patterns, less allows you to browse text, gzip compresses files, etc. Pipes allow you to "pipe" the output of one program to the input of another in a rather transparent way. Each program's input and output is always being piped somewhere. The shell pipes your keyboard input to the program you are currently running, and captures your program's output to send to the terminal. You can think of unix as a system of little black boxes that all do one thing. There are a bunch of pipes lying around that can be connected between the black boxes however you wish. By rewiring the pipes, you can create a multitude of complex behaviors out of a collection of simple machines. You can solve a lot of programming problems on a unix system simply by piping a bunch of small programs together. For example, if I wanted to find all the connections to my web server from ip 216.122.66.112 I might write this perl script: `open IN, "<access.log" or die; while (<IN>) { print if $_ =~ /^216\.122\.66\.112/; } close IN;` [download] But say I want to be able to browse these results in less. I'd have to write my output to a file and then call less on that file. Or I could do this in the shell: % grep '^216\.122\.66\.112' access.log \| less -S Or I could pipe the output of my perl program to less. Rewriting less in perl would be a real pain, but any program can allow the user to browse its output with less with the help of a pipe. It's very nice to be able to pipe output from another process into your perl program so you don't have to continually reinvent the wheel. If you want to learn more about this sort of thing, I'd suggest picking up a good general unix reference as a first step. For example, W. Richard Stevens has written a number of great unix books, covering topics from interprocess communication to network programming. Once you have a handle on unix interprocess communication, chapter 16 of the Perl Cookbook will provide you with most of what you need to know about the how in Perl.	[reply] [d/l]
Re: Pipes? Child Processes? by pg (Canon) on Dec 05, 2002 at 03:15 UTC
There are many different ways to communicate among processes. Pipe, socket, ..., SOAP. Each has its advantages and disadvantages, but on unix, using pipe probably requires the least amount of development effort. However, pipe is not quite portable. Some interesting points about pipe: Pipe is the abstraction of a type of unidirectional stream communication. It involves two user processes, one write-end and one read-end. The stream going through pipe is unstructured, as far as the pipe sees. Whether the data you send is structured, is really your design consideration, which your program cares and understands, but not the pipe. The internal implementation of a pipe, is actually a buffer + two descriptors + two pointers. The writer pointer advances, when the writer process writes to the buffer, and the reader pointer advances, when the reader process reads from the buffer. When your process writes to/reads from a descriptor, it does not really care whether the descriptor is a process or a file, or whatever. It is just another descriptor. Another thing I want to mention is that, traditionally, Perl deal with lots of forked processes, but I personally believe that, as Perl's threading support becomes more and more stable and powerful, in the future, we should expect less forks (I am saying lots of things required fork in the past, will be done by using threading in the future. However as long as process is there, inter-process communication will remain there. There are lots of articles talking about why process, why threading. My view is that really depends on the application, configuration, environment, situation etc.)	[reply]
Re: Pipes? Child Processes? by TGI (Parson) on Dec 05, 2002 at 03:31 UTC
Sometimes you want to do many things at once. For example, imagine that you have database of URLs you want to index. You could write a perl script that would request each URL and process the resulting doc. Or, you could write a script that would fork many children to request and process the URLs. The parent script can then manage the children and harvest the data that they produce. Since this sort of application is unlikely to be bound by CPU, RAM or network bandwidth (latency would be the key issue here), it is an excellent candidate for this approach. Sometimes forking is used to reduce exposure to security risks. A process may need to have an elevated level of access to complete a task. If most of the task can be achieved with limited access, it can be useful to fork and exec a tool that is limited to executing the portion of the task that needs enhanced access. TGI says moo	[reply]
Re: Pipes? Child Processes? by graff (Chancellor) on Dec 05, 2002 at 05:14 UTC
One very nice thing about using pipes is that it gives you a very easy way to make use of tools that already exist for specific tasks. If there is already a process that works on a command line and does a particular kind of data filtering, then you don't need to link in an extra library, or install another module, or (God Forbid) rewrite that filtering process in your own perl code; just use that existing command-line utility as part of a pipe. Most of the beauty of the classic UNIX command line utilities -- many of which have been diligently built into the perl core (sort, grep, sed, cut, paste, ls, find, ...) -- is that each one by itself does some simple thing very well, with an appropriate range of flexible options for tweaking its behavior; and pipeline commands let you plug these things together in various ways to perform a vast range of useful things, with just the shell command line as your programming language. When you apply this sort of tool design to things like signal or image processing, the payoff is dramatic; a half-dozen or so basic utilities -- each of which is fairly simple, offers a handful of parameterized options, and uses a common notion of what to expect on stdin and what to produce on stdout -- will give you a virtually unlimited toolkit. (Well, there are some subtelties involved, and it can get a bit complicated, but the basic idea is still a big win compared to writing a new program every time you have arrange a given set of operations in a different sequence.) In essence, as you make heavier use of pipelines, in your perl code and/or on the command line, you'll find that you spend less time (re)writing program code for the next task.	[reply]
Re: Pipes? Child Processes? by arrow (Friar) on Dec 06, 2002 at 17:42 UTC
Thanks to everybody that answered my query, especially chromatic (great story, I think...) and belg4mit, he really hit the nail on the head when he guessed I didn't know what I was talking about. Thanks again to all who enlightened me! Just Another Perl Wannabe	[reply]