cspctec has asked for the wisdom of the Perl Monks concerning the following question:

I have the following data (output from "ps -eo pid,ppid,comm"):
PID PPID COMMAND 0 0 sched 1 0 /sbin/init 7 0 vmtasks 105 1 /usr/lib/saf/sac 7184 1 /usr/bin/java 7222 1 /usr/lib/utmpd 7501 6223 /usr/sbin/nscd 7507 7184 /bin/sh 7508 7507 /usr/bin/perl 7510 5044 /usr/bin/grep 7512 4333 /usr/bin/egrep 7515 7508 sh 7516 7515 <defunct>

and I'm trying to write a script that will output a "trace" of a defunct process, so to speak. It would output something like this:

1 0 /sbin/init 7184 1 /usr/bin/java 7507 7184 /bin/sh 7508 7507 /usr/bin/perl 7515 7508 sh 7516 7515 <defunct>

I'm having trouble deciding what to use to store this data. I'm thinking a hash, but I need to keep track of three different values. Would a multi-dimensional hash work in this case? I've been reading about multi-dimensional hashes, but I'm still unsure if I should be using them here. The output of the ps command is stored in a file and the file is large, so I would need an efficient script. Thanks for any suggestions.

Replies are listed 'Best First'.
Re: Help parsing this data (3)
by tye (Sage) on Aug 13, 2015 at 03:24 UTC

    I'd use several different data structures, a hash of arrays, an array, and an array of arrays.

    use strict; my %byPid = ( 0 => [ 0, 'sched' ], 1 => [ 0, '/sbin/init' ], 7 => [ 0, 'vmtasks' ], 105 => [ 1, '/usr/lib/saf/sac' ], 7184 => [ 1, '/usr/bin/java' ], 7222 => [ 1, '/usr/lib/utmpd' ], 7501 => [ 6223, '/usr/sbin/nscd' ], 7507 => [ 7184, '/bin/sh' ], 7508 => [ 7507, '/usr/bin/perl' ], 7510 => [ 5044, '/usr/bin/grep' ], 7512 => [ 4333, '/usr/bin/egrep' ], 7515 => [ 7508, 'sh' ], 7516 => [ 7515, '<defunct>' ], ); my @defunct = ( 7516 ); for my $d ( @defunct ) { my @layers; my $pid = $d; while( 1 ) { my( $parent, $cmd ) = @{ $byPid{$pid} } or last; unshift @layers, [ $pid, $parent, $cmd ]; last if $pid == $parent; $pid = $parent; } printf "%6s %6s %s\n", 'PID', 'PPID', 'Cmd'; for my $layer ( @layers ) { printf "%6d %6d %s\n", @$layer; } print "\n"; } __END__ PID PPID Cmd 0 0 sched 1 0 /sbin/init 7184 1 /usr/bin/java 7507 7184 /bin/sh 7508 7507 /usr/bin/perl 7515 7508 sh 7516 7515 <defunct>

    - tye        

Re: Help parsing this data
by Tux (Canon) on Aug 13, 2015 at 06:30 UTC

    Once you grabbed the fine advice from the other replies and got your trace in place, take a moment to look at Proc::ProcessTable, as using the output of the ps command to parse into a tree is one of the least reliable ways to do. Not only might the output differ suddenly if the system updates the ps process (yes, that also can happen), but it for sure will be different on a different OS!

    The output of ps might be similar of Linux distributions, but do not try to port your script to OpenBSD, AIX, HP-UX, Solaris, OSX, Windows, VMS, or OSF/1. Your first surprise may be that many options you used do not even exist!

    HP-UX $ ps -eo pid,ppid,comm ps: illegal option -- o usage: ps [-edaxzflP] [-u ulist] [-g glist] [-p plist] [-t tlist] [-R +prmgroup] [-Z psetidlist] AIX $ ps -eo pid,ppid,comm PID PPID COMMAND 1 0 init 78030 143498 xntpd : NetBSD $ ps -eo pid,ppid,comm PID PPID COMMAND 5908 16576 USER=tux LOGNAME=tux HOME=/home/tux PATH=/usr/bin:/bin:/us +r/pkg/bin:/u 22789 5908 USER=tux LOGNAME=tux HOME=/home/tux PATH=/home/tux/bin:/us +r/local/bin: Windows C:\Users\Tux>ps -eo pid,ppid,comm 'ps' is not recognized as an internal or external command, operable program or batch file.

    Enjoy, Have FUN! H.Merijn
      I definitely agree with you, Tux, using the output of a ps shell command is usually not a good idea, and I was on the verge of writing just that in my previous post, but I refrained from doing it, because it seems that the OP is really reading from a large file with the output of the ps command, so that it appears to be perhaps a one-off task on an existing file containing historical trace of what happened on a Linux box sometimes ago (a kind of post-mortem analysis).
Re: Help parsing this data
by talexb (Chancellor) on Aug 13, 2015 at 01:17 UTC

    It looks like you want to build a family tree, so perhaps you'd like to use a array of hashes (AoH) with PID and COMMAND entries for each element, as you go down the family tree.

    So the data structure for your data would look something like this:

    @trace = ( { pid => 1, command = '/sbin/init' }, { pid => 7184, command = '/usr/bin/java' }, { pid => 7507, command = '/bin/sh' }, { pid => 7508, command = '/usr/bin/perl' }, { pid => 7515, command = 'sh' }, { pid => 7516, command = '<defunct>' }, );
    The info I've left out is the PPID, but that would just be the previous entry's PID.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Help parsing this data
by Laurent_R (Canon) on Aug 13, 2015 at 06:26 UTC
    I would use a simple hash, with the PIDs as the keys, and the full lines (or possibly the line without the PID) as values:
    ( ..., 7510 => "7510 5044 /usr/bin/grep", 7512 => "7512 4333 /usr/bin/egrep", 7515 => "7515 7508 sh", 7516 => "7516 7515 <defunct>" )
    It is then very easy to read the PPID of the defunct process (7515), and from there look at the parent (7515 is its key), find the grand parent ID (7508) and look at the grand parent, and so on.

    No need for a more complicated data structure, and this will be very fast.

Re: Help parsing this data
by Anonymous Monk on Aug 13, 2015 at 02:31 UTC

    Hash free !

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1138357 use strict; use warnings; $_ = <<END; PID PPID COMMAND 0 0 sched 1 0 /sbin/init 7 0 vmtasks 105 1 /usr/lib/saf/sac 7184 1 /usr/bin/java 7222 1 /usr/lib/utmpd 7501 6223 /usr/sbin/nscd 7507 7184 /bin/sh 7508 7507 /usr/bin/perl 7510 5044 /usr/bin/grep 7512 4333 /usr/bin/egrep 7515 7508 sh 7516 7515 <defunct> END for my $defunct ( /^\s*(\d+)\s+\d+\s+<defunct>/gm ) { my $pid = $defunct; my $answer = ''; while( $pid > 0 && /^(\s*$pid\s+(\d+).*\n)/m ) { $answer = $1 . $answer; $pid = $2; } print "$answer\n"; }
Re: Help parsing this data
by kcott (Archbishop) on Aug 13, 2015 at 10:21 UTC

    G'day cspctec,

    I'd read through the ps output adding key-value pairs to a hash with this format:

    PID => [ PPID, COMMAND ]

    At the same time, adding any '<defunct>' pids to an array.

    Then, it's a simple matter to iterate the array of defunct pids and generate the trace from the data in the hash. Here's the sample code (which handles multiple defunct pids):

    #!/usr/bin/env perl use strict; use warnings; my (%ps_data, @defunct_processes); while (<DATA>) { next if $. == 1; # Skip header: 'PID PPID COMMAND' chomp; my ($pid, $ppid, $cmd) = split ' ', $_, 3; $ps_data{$pid} = [$ppid => $cmd]; push @defunct_processes, $pid if $cmd eq '<defunct>'; } for my $pid (@defunct_processes) { my @trace; while ($pid >= 1) { push @trace, [$pid, @{$ps_data{$pid}}]; $pid = $ps_data{$pid}[0]; } printf "%4d %4d %s\n", @$_ for reverse @trace; } __DATA__ PID PPID COMMAND 0 0 sched 1 0 /sbin/init 7 0 vmtasks 105 1 /usr/lib/saf/sac 7184 1 /usr/bin/java 7222 1 /usr/lib/utmpd 7501 6223 /usr/sbin/nscd 7507 7184 /bin/sh 7508 7507 /usr/bin/perl 7510 5044 /usr/bin/grep 7512 4333 /usr/bin/egrep 7515 7508 sh 7516 7515 <defunct>

    Output:

    1 0 /sbin/init 7184 1 /usr/bin/java 7507 7184 /bin/sh 7508 7507 /usr/bin/perl 7515 7508 sh 7516 7515 <defunct>

    Note that the split command uses a LIMIT of 3. This allows '/path/with spaces/to/command' to be captured in full; without that, you'd only capture '/path/with'.

    — Ken

Re: Help parsing this data
by Anonymous Monk on Aug 13, 2015 at 13:08 UTC

    Maybe for a large file it might be faster with memoize. Or maybe not. Benchmark testing is left up to the user :)

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1138357 use strict; use warnings; $_ = <<END; PID PPID COMMAND 0 0 sched 1 0 /sbin/init 7 0 vmtasks 105 1 /usr/lib/saf/sac 7184 1 /usr/bin/java 7222 1 /usr/lib/utmpd 7501 6223 /usr/sbin/nscd 7507 7184 /bin/sh 7508 7507 /usr/bin/perl 7510 5044 /usr/bin/grep 7512 4333 /usr/bin/egrep 7515 7508 sh 7516 7515 <defunct> 8000 1 /second/example/for/testing 8001 8000 <defunct> END my (%pids, %memoize); $pids{$2} = [$3, $1] while /^(\s*(\d+)\s+(\d+).*\n)/gm; sub trace { $memoize{$_[0]} //= do { my ($ppid, $line) = @{ $pids{shift()} }; ($ppid ? trace($ppid) : "\n") . $line; } } print trace $_ for /^\s*(\d+)\s+\d+\s+<defunct>/gm;
A reply falls below the community's threshold of quality. You may see it by logging in.