JonesyJones has asked for the wisdom of the Perl Monks concerning the following question:

Perl Monks,

I have a script that looks for process ids running on a system, so I take the output from ps on Red Hat Linux (or tasklist on Windows), and store those values in an array. The odd thing is that I am always missing the last digit of the process id (on Windows and Linux). I have written the regex two different ways, but achieve the same result. Can you see what I am doing wrong?

Method 1

sub getPids{ my ($processName,$owner) = @_; my $ps; my $pid; my $command = "/bin/ps auwwwx | grep $processName | grep -v grep | +grep -v 'sh -c' "; $ps = `$command`; # print $ps; my @lines = split( "\n",$ps); # print @lines; foreach my $line (@lines){ # print $line . "\n"; if ($line =~ qr|$owner\s+(\d*)\w+\s+\d+.+processName=$processName +|){ $pid = $line =~ qr|$owner\s+(\d*)\w+\s+\d+.+processName=$proces +sName|; say "Found $processName in getPids() owned by $owner. PID is $1 +"; #print "$1\n"; }else{ say "Loser"; } } }

Method 2

if ($line =~ qr|$owner\s+(\w+)\w+\s+\d+.+processName=$processName|){ $pid = $line =~ qr|$owner\s+(\w+)\w+\s+\d+.+processName=$processName +|; say "Found $processName in getPids() owned by $owner. PID is $1"; #print "$1\n"; }else{ say "Loser"; }

Replies are listed 'Best First'.
Re: Regex and PID issue
by stevieb (Canon) on Jun 16, 2016 at 18:32 UTC

    The reason the regex doesn't work is because of this portion: (\d*)\w+\s+. What that does is search for zero or more digits and puts them in a capture group (\d*), as long as the next character is a word char (\w+), followed by at least one whitespace (\s+).

    A digit is a word character. Since the only thing immediately after the proc ID is whitespace, the \w+ steals the last digit from it, before the next whitespace \s+. It only takes the last one, because \d* is greedy, and will gulp everything as far as it can until the regex doesn't match, then it backtracks.

    The regex can be simplified a bit: qr|$owner\s+(\d+).*\s+$processName$|. Explained:

    qr| $owner # proc owner \s+ # one or more whitespace (\d+) # capture one or more digits .* # everything up until... \s+ # the last whitespace $processName # proc name $ # end of string |x

    In action:

    use warnings; use strict; use feature 'say'; getPids('top', 'ubuntu'); sub getPids{ my ($processName,$owner) = @_; my $ps; my $pid; my $command = "/bin/ps auwwwx | grep $processName | grep -v grep | + grep -v 'sh -c' "; $ps = `$command`; # print $ps; my @lines = split( "\n",$ps); # print @lines; foreach my $line (@lines){ if ($line =~ qr|$owner\s+(\d+).*\s+$processName$|){ say "Found $processName in getPids() owned by $owner. PID +is $1"; }else{ say "Loser"; } } }

    Output:

    Found top in getPids() owned by ubuntu. PID is 9377 # orig: ubuntu 9377 0.0 0.1 23668 1600 pts/2 S+ 18:23 0:00 top
      Thanks for the breakdown on the regex, I am new to it. I agree with you and the simplification. The problem I have is with $1. The value in there is missing the last digit.

        Did you actually read my post? Did you try my code? Did you compare my output to the actual command line output?

Re: Regex and PID issue
by AnomalousMonk (Archbishop) on Jun 16, 2016 at 19:06 UTC
    ... I am always missing the last digit of the process id ...
    qr|$owner\s+(\d*)\w+...|
    qr|$owner\s+(\w+)\w+...|

    If your  $line is like the examples given by hippo and stevieb, the  \w+ immediately following the capturing  (\d*) (or following the  (\w+) in your Method 2) requires that the captured decimal sequence give back one digit in order to achieve an overall match. (Remember that  \w includes the  \d character class.)

    c:\@Work\Perl\monks>perl -wMstrict -le "my $line = 'ubuntu 9377 0.0 0.1 23668 1600 pts/2 S+ 18:23 + 0:00 top'; my $owner = 'ubuntu'; ;; $line =~ m{ $owner \s+ (\d*) (\w+) }xms; print qq{captured: (\\d*) ($1) (\\w+) ($2)}; " captured: (\d*) (937) (\w+) (7)
    The regex engine always does whatever is necessary to achieve an overall match.

    Update: Oops... Didn't carefully read stevieb's post above, wherein all this is explained — and a bit better!


    Give a man a fish:  <%-{-{-{-<

Re: Regex and PID issue
by hippo (Archbishop) on Jun 16, 2016 at 18:22 UTC

    This somewhat simpler regex works for me:

    #!/usr/bin/env perl use strict; use warnings; use Test::More tests => 1; my $line = 'root 1435 0.0 0.0 147236 7548 ? Ss 08:46 + 0:02 /usr/local/httpd-2.4.16/bin/httpd -k start'; my $processName = 'httpd'; my $owner = 'root'; my ($pid) = $line =~ qr|$owner\s+(\d*)\s+\d.+$processName|; is ($pid, '1435', 'Match');
      I agree with you and the simplification. The problem I have is with $1. The value in there is missing the last digit.

        So, add a test for that as well:

        #!/usr/bin/env perl use strict; use warnings; use Test::More tests => 2; my $line = 'root 1435 0.0 0.0 147236 7548 ? Ss 08:46 + 0:02 /usr/local/httpd-2.4.16/bin/httpd -k start'; my $processName = 'httpd'; my $owner = 'root'; my ($pid) = $line =~ qr|$owner\s+(\d*)\s+\d.+$processName|; is ($pid, '1435', 'Match'); is ($1, '1435', 'Match');