koolgirl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I'm finally approaching the end of the llama, and almost ready to begin my first big project, which I'm designing now. Perl Monks has helped me through this tremendously, and I thank all of you for your time, effort, patience and encouragement :).

However, there seems to be one last hump I'm having a lot of trouble getting over. I can not seem to wrap my head around the concepts of Process Management. I believe a big part of the problem is that I'm not very efficient at System Administration on Unix/Linux based systems, and even though I do work/code on a Linux box, as far as system commands and administration, I still have a long way to go.

I have the first and third editions of the llama book, Perl in a Nutshell, the Camel book and the cookbook, but none of these (and believe me, I've read and re-read all of them searching for anything that might make the light go on) seem to be enlightening me. The concepts of parent-child relationships, system/exec, processes, forking, process operations, etc. is just completely frustrating to me. I'm seriously at a stand still in my advancement in Perl, because I'm right here at the end, and the last concept is stumping me! Has anyone else experienced this in their Perl quest? Should I stop pursuing Perl and focus on Linux administration for a bit, then return? Any and all suggestions, tips, even harsh lectures meant to slap one in the face with much needed knowledge, would be welcome at this point, I'm desperate for ideas on how to move forward, as I am so incredibly anxious to finish this book and begin working on my projects.

Thank you all for your time, my dearest Monks. If anyone experienced this same situation, perhaps because of a lack of Linux skills at the time of learning Perl, as in my case, please let me know if any type of study based on outside materials, certain exercises, books, anything, helped the light bulb finally come on. I'm almost to the point where I can function and begin to be a contribution rather than a drain, Monks, just once more (hopefully) I beg for your patience and help in such an elementary matter :).

UPDATE:

Thank you all so much for your responses. :D I do understand the parent - child process much better now, and as far as everything else goes in process management concepts, I am now armed with some fantastic resources, so hopefully, a few dozen light bulbs will be coming on in koolgirl's head over the next few weeks. Geeze, I know one day I'm gonna look back on this, ya know when Rockstar Programming Inc. has out sold everything Microsoft has ever tried to do, and turn bright red with humiliation. However, I will hopefully still have my fellow Monks to laugh about it with ;).

Replies are listed 'Best First'.
Re: Process Management
by doom (Deacon) on Jan 17, 2009 at 20:21 UTC
    Can you say a little more about what you think you don't get? It's a little hard to know where to begin except to point you at other, more general references...

    Do you understand the idea of time-slicing? The idea is that even if you've just got one processor down there running everything, you can handle multiple jobs at the same time by having the processor constantly switching between jobs. In Unix, each of these jobs is called a "process" (or these days, a "thread", which is a similar idea).

    In unix-like systems, one process can spin off another process by doing a "fork", which creates a clone of the first process -- and note that both the original and the clone then are running the same code. The code itself has to be able to figure out whether it's the original process (the "parent") or the clone (the "child"), and it can do that by looking at the return from the "fork" command. A "fork" returns the process id of the newly cloned child to the parent... but it doesn't return anything to the child, so the code can check that return value to find out whether it's parent or child.

    Just for the hell of it, here's a demo script I wrote recently to show how you can use a child process to handle a long-running task while the parent goes off and does something else. This script has the child reporting on it's status to the parent so that, for example, you can update a "percent done" message while the child is working:

    #!/usr/bin/perl =head1 NAME child_to_parent_ipc_demo - child process reports on it's status to the + parent =head1 SYNOPSIS child_to_parent_ipc_demo =head1 DESCRIPTION A demonstration of a technique of spinning off a child process which can report on it's progress back to the parent process, while letting the parent go off and do some other things. =cut use warnings; use strict; # $|=1; our $VERSION = 0.01; use IO::Handle; # brings in autoflush and blocking # first set up a bi-directional pipe, # (will later close one direction) my ($reader, $writer); pipe $reader, $writer; $writer->autoflush(1); # still needed, even with modern perls $reader->blocking(0); # keeps reads from hanging my $pid; if ($pid = fork) { # this is parent code close $writer; print "Parent $$ is hearing from child $pid:\n"; while (1) { my $status = ''; $status = <$reader>; # magic "blocking(0)" keeps # this from blocking if ( defined( $status ) and ( $status =~ m{ ^ \d+ $ }xms ) ){ # recieved an update child process status chomp( $status ); print "\n$status% "; if ( $status >= 90 ) { close $reader; print "\n"; exit; } } else { # can continue doing stuff here while child is running print "."; sleep 1; } } } else { # this is child code die "cannot fork: $!" unless defined $pid; close $reader; # child code that takes awhile, but reports on it's status for my $i (1..10) { # the child takes a varying amount of time my $pause = int( rand(4) + .5 ); sleep $pause; my $status = $i * 10; print $writer "$status\n"; } close $writer; # i/o doesn't go through until closed, # *unless* autoflush is set to 1 exit; } __END__ =head1 SEE ALSO L<perlipc> The Perl Cookbook, 2nd ed. Chapter 16 (interprocess communication) Chapter 7 (non-blocking i/o) =head1 AUTHOR Joseph Brenner, E<lt>doom@kzsu.stanford.eduE<gt> =head1 COPYRIGHT AND LICENSE Copyright (C) 2008 by Joseph Brenner This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available. =head1 BUGS None reported... yet. =cut

      Although not directly related to the OP's question, this is still important.

      There's a subtle bug in this code that I have seen repeatedly. This code assumes two potential returns from fork. The fork call actually has three potential return values, not two.

      • positive integer - pid of child process returned to the parent process
      • 0 - returned to the child process
      • undef - returned to the parent process if the fork call cailed.

      I learned about this when a long-running process started failing when it was moved to a much, much faster machine. It forked enough child processes in a burst to run out of process slots and then failed. Since the return was not true, it went on to do work as a child and exited when it was done. Now, the main process was gone.

      Update: As almut pointed out, the undef case was dealt with in the else. I can only say that I'm a little oversensitive to this bug after losing a fair amount of time on the above case.
      G. Wade
        There's a subtle bug...

        Isn't doom already taking care of the 'fork() failed' case (returning undef) with this line?

        ... } else { # this is child code die "cannot fork: $!" unless defined $pid; # <-- ...
        That's interesting. So is your point that this is a case where one should check if the return is defined (and not just "true"), or does there need to be a third branch to handle the "0"?

        Neither the examples in perlipc or in the Perl Cookbook make any attempt at special handling of a return of "0".

      doom, Thank you very much! That explanation cleared up a lot of questions about the parent - child process concepts I've been struggling with. I know my post was so very generalized, and that bothered me, but I really just don't even know enough about the general concepts of process management, to formulate a well written specific question, so this was the best I could come up with.

Re: Process Management
by shmem (Chancellor) on Jan 17, 2009 at 20:35 UTC

    Do:

    man 2 fork; man 2 execve; man 2 waitpid

    and read a bit.

    • fork - a program clones itself; the clone is the child, the cloning program the parent
    • exec - a program replaces itself with another program (like the 'magic goto' in perl)
    • waitpid - a parent program waits for a child program to change state

    "A child is the exact copy of the parent, until it begins to develop own ideas."

    So, if a progam is to execute another program, it does a fork system call. The clone then execs the program to be run.

    Get a terminal and type in man perlsyn. Then get another terminal and type in

    ps fu | less
    and scroll about, look for perlsyn. You'll see something like
    shmem 28521 0.0 ... 0:00 xterm shmem 28523 0.0 ... 0:00 \_ bash shmem 28726 0.0 ... 0:00 \_ man perlrun shmem 28752 0.0 ... 0:00 \_ sh -c (cd /usr/local/share +/man ... shmem 28753 0.0 ... 0:00 \_ sh -c (cd /usr/local/s +hare/... shmem 28757 0.0 ... 0:00 \_ less

    Here you have a process chain, which has been setup as follows:

    1. xterm forked and its clone exec'ed bash
    2. bash forked and its clone exec'ed man. The bash is just sitting around in the waitpid(2) system call waiting for man to finish.
    3. man forked and its clone exec'ed a shell, which itself
    4. forked and exec'ed a subshell, which
    5. forked and exec'ed less, which is the pager I use to read manual pages with.

    In the above table, for each process, the number after the user name is the process ID (or PID). bash is child of xterm, and is parent of man. A child can get at its parent PID via the getppid(2) system call; the result of a fork is the child PID in the parent, and 0 in the child. By those numbers they know of each other and can start killing each other (or process manage themselves, if they're behaved).

    Perl's system is essentially a fork/exec and the parent hanging around in waitpid(2).

    If a program (grandfather) does a fork, and that clone (clone_1) clones itself again into clone_2, and clone_1 exits, you have a double fork, and the last clone is detached from the grandfather and adopted by init(8). See man 8 init.

    Read those manual pages and those provided under 'SEE ALSO'. Recurse.

    You won't understand anything at all at first, but after you have finished the recursion, things will begin to fall into place... ;-)

      Nice done. Good Explanation. pstree could also help visualize ...

      Thanks MH
Re: Process Management
by Lawliet (Curate) on Jan 17, 2009 at 18:43 UTC
    I'm finally approaching the end of the llama
    there seems to be one last hump I'm having a lot of trouble getting over

    You're confusing your animals ;D

    Anyway … your problem reminds me of wise words said by some wise people I have met. Of course, I do not remember what they said but I know the gist of it. The hardest part about learning something new is learning how to think. (Doesn't sound as cool when I say it.)

    If you hang around the culture long enough, you will pick up on things. There are ways to speed up this process (tutorials, for instance) but I cannot help you with that. I just linger and learn as I go.

    Also, Perl and Linux go together like water and food coloring -- so, if you focus on Linux administration, you will probably be using Perl ;)

    And you didn't even know bears could type.

Re: Process Management
by zentara (Cardinal) on Jan 17, 2009 at 20:14 UTC
    On linux, what you want to do is look thru the /proc directory. You will see directories named after the pid, containing all the info and file_descriptors for that process. Additionally, there are files in there containing system information. You can google for "linux proc" and get lots of info.

    Perl has a module for looping thru it all and creating stats, it uses alot of cpu though, so often you might want to run the system utlities like 'top', and 'ps auxww', thru system.

    #!/usr/bin/perl use Proc::ProcessTable; # Dump all the information in the current process table $t = new Proc::ProcessTable; while(1){ foreach $p (@{$t->table}) { print "\n--------------------------------\n"; foreach $f ($t->fields){ print $f, ": ", $p->{$f}, " "; } } sleep(1); }

    I'm not really a human, but I play one on earth Remember How Lucky You Are
      system utlities like 'top', and 'ps auxww'

      ps's option 'f' might be of particular interest to the OP, as it nicely visualizes the parent-child relationships of the processes in a tree-like fashion.  (My personal default choice of ps options thus also is 'ps axf', to see what's running on the system.)

Re: Process Management
by zwon (Abbot) on Jan 17, 2009 at 18:49 UTC

    I think it would be better to read something more UNIX specific if you want to learn how processes on UNIX work. I personally can recommend Stevens's Advanced Programming in the UNIX Environment or Unix System Programming by Keith Haviland, Dina Gray, and Ben Salama.

      I second the recommendation of Advanced Programming in the UNIX Environment, Second Edition. This is a great book to get if you want to code under a *nix system. It's also a great example of really, really good technical writing.


      Life is denied by lack of attention,
      whether it be to cleaning windows
      or trying to write a masterpiece...
      -- Nadia Boulanger
Re: Process Management
by skirnir (Monk) on Jan 17, 2009 at 20:32 UTC

    Maybe I'm mistaken here but reading your question I got the impression that you think you have to master process management before you are prepared to take on your project. I think you should just go for it, picking up knowledge as needed. If the project you're working on isn't trivial, you'll inevitably be hitting walls, usually where you didn't expect them to be, so you may as well just get going and roll with the punches.

    Having said that, I found The Linux Documentation Project quite helpful. The howtos and guides there aren't exactly up to date, but they're still a valuable source of information and cover many topics.

      ++ skirnir :) I am going to move forward, with the help of all the wonderful resources I've discovered from all the Monk's responses of course, and bust this project out regardless of my insecurity about my skills. ;)

      UPDATE:

      Apparently something about this comment offended a few people. Haven't the slightest clue why. I'm only stating that I'm going to attempt to begin this project, regardless of my lack of certain skills, and hopefully accomplish something worthwhile thanks to all of the resources these helpful responses have given me. So, thanking people for the time and effort they've given in trying to help me and attempting to learn by trying...sorry for being offensive, my bad.

Re: Process Management
by graff (Chancellor) on Jan 18, 2009 at 02:09 UTC
    I'm desperate for ideas on how to move forward, as I am so incredibly anxious to finish this book and begin working on my projects.

    This seems to imply that the projects you have in mind are likely to involve inter-process communication (IPC), child processes and the like. If so, it might move things forward more quickly to describe what you have in mind to implement, and what parts of the implementation are still mysterious to you.

    Any notion at all that you can come up with about how to split a project into distinct processes is good enough for a start -- try something out, maybe even try a couple variations, and if you get into trouble with that, share the details of the specific problem. General knowledge in the absence of particular usage is of questionable value.

    Also, I see you've been reading a lot of books. Have you been reading manual pages too? If not, you need to do that. For the question at hand, if you haven't read perlipc, then that is the next thing you should do. You probably won't understand all of it (I suspect there are parts I don't quite understand yet), but you should be able to locate sections that will be relevant to what you want to do next, and you should be able to make some sense of those sections -- at least, enough to try a few things out.

    Then again, if the projects you have in mind don't really depend crucially on process management and IPC, then why put off moving ahead with these projects? Let the IPC know-how come to you if/when the need arises. Just because you lack this know-how at the moment doesn't mean you can't do anything.

Re: Process Management
by missingthepoint (Friar) on Jan 18, 2009 at 05:13 UTC
    I'm desperate for ideas on how to move forward

    One piece of advice has served me VERY well when it comes to programming.

    "When faced with a problem you do not understand, do any part of it you do understand, then look at it again." -- RA Heinlein, The Moon is a Harsh Mistress

    Life is denied by lack of attention,
    whether it be to cleaning windows
    or trying to write a masterpiece...
    -- Nadia Boulanger
Re: Process Management
by gctaylor1 (Hermit) on Jan 17, 2009 at 23:39 UTC
    I too am a beginner and would echo the sentiments of skirnir. That is, you don't have to master everything in the book. Unless of course your project needs it. Often when there's a concept I can't comprehend or have zero knowledge of I panic and immediately think I need to buy a book, but if I'm patient and keep at it, I'll find something that shows me the way.

    If you really want to buy a book and/or browse through it, look at Perl by Example by Ellie Quigley. It does cover much about processes and forks, and PIDs, and much other stuff that I'm interested in but will wait until I need more depth. In fact I've been working through this book and got tired of it so took a diversion through a small project or two, and attempted CPAN testing, and Net::Telnet, and, and, etc. I always know where the information is when I need it.

Re: Process Management (Linux Sysadmin)
by matze77 (Friar) on Jan 18, 2009 at 11:43 UTC

    Hmm. I read your post now twice, since i am a beginner i am not sure if I could help you on a specific topic, maybe you could provide a little code or pseudo code in your project which explains what you want to achieve what you tried e.g.

    What i have learned is that I learn fastest if I got a goal: a project which I want to finish (or others want me to do), learning without a "goal" is not so funny cause you dont see the little progress in your small steps to a given subject.

    For sysadmin practice:
    I got a rootserver some years back and implemented a Mail Server on it (and other web services) just for learning, i learned so many sysadmin tasks just "by the way" it was -and is- really funny and it is not only theory, so if you could spent some money you could rent a virtual root server for practice e.g.. Or get your own home server, choose a project which you like is important and you learn alot.


    Most useful books for me on Linux Sysadmin (I got a shelf of them but those are the most valueable, please check for latest editions):
    Evi Nemeth, Linux Systemadministration ISBN 9780131480049
    (I like the exercises they are really demanding, my brother was a totally Linux beginner and it helped him at his company too, it must be good if Linus Torvalds himself writes a foreword ;-))
    How Linux Works, Brian Ward ISBN 9781593270353
    Unix Administration, Aeleen Frisch ISBN 9783897213470
    (a little old but the concepts didnt change it is a classic imho...)
    Thanks in Advance MH
Re: Process Management
by ig (Vicar) on Jan 24, 2009 at 20:58 UTC

    I learned about processes in unix by reading

    S. Leffler, M. McKusick, M. Karels, J. Quarterman, The Design and Implementation of the 4.3BSD UNIX Operating System, Addison-Wesley Publishing Company, Reading, MA January 1989, ISBN 0-201-06196-1.

    It gives clear and detailed descriptions of how the operating system works, including what processes are and how they are managed.

    This book, 4.3BSD (and me) are quite old now, but the concepts are still clear and relevant.

    There is a much more recent book by McKusick and Neville-Neil:

    M. McKusick, G. Neville-Neil, The Design and Implementation of the FreeBSD Operating System, Addison-Wesley Publishing Company, Reading, MA. April 2005, (ISBN 0-201-70245-2).

    The text of Chapter 4: Process Management, is available on-line.

    Implementation details will be different for other flavours of *nix (e.g. linux) but the basic concepts of what a process is and how it is managed will remain consistent.

    Update: corrected the reference to the old book.

Re: Process Management
by Argel (Prior) on Jan 20, 2009 at 02:13 UTC