in reply to Regex and PID issue

The reason the regex doesn't work is because of this portion: (\d*)\w+\s+. What that does is search for zero or more digits and puts them in a capture group (\d*), as long as the next character is a word char (\w+), followed by at least one whitespace (\s+).

A digit is a word character. Since the only thing immediately after the proc ID is whitespace, the \w+ steals the last digit from it, before the next whitespace \s+. It only takes the last one, because \d* is greedy, and will gulp everything as far as it can until the regex doesn't match, then it backtracks.

The regex can be simplified a bit: qr|$owner\s+(\d+).*\s+$processName$|. Explained:

qr| $owner # proc owner \s+ # one or more whitespace (\d+) # capture one or more digits .* # everything up until... \s+ # the last whitespace $processName # proc name $ # end of string |x

In action:

use warnings; use strict; use feature 'say'; getPids('top', 'ubuntu'); sub getPids{ my ($processName,$owner) = @_; my $ps; my $pid; my $command = "/bin/ps auwwwx | grep $processName | grep -v grep | + grep -v 'sh -c' "; $ps = `$command`; # print $ps; my @lines = split( "\n",$ps); # print @lines; foreach my $line (@lines){ if ($line =~ qr|$owner\s+(\d+).*\s+$processName$|){ say "Found $processName in getPids() owned by $owner. PID +is $1"; }else{ say "Loser"; } } }

Output:

Found top in getPids() owned by ubuntu. PID is 9377 # orig: ubuntu 9377 0.0 0.1 23668 1600 pts/2 S+ 18:23 0:00 top

Replies are listed 'Best First'.
Re^2: Regex and PID issue
by JonesyJones (Novice) on Jun 16, 2016 at 18:43 UTC
    Thanks for the breakdown on the regex, I am new to it. I agree with you and the simplification. The problem I have is with $1. The value in there is missing the last digit.

      Did you actually read my post? Did you try my code? Did you compare my output to the actual command line output?

        I've resolved it with your help, see below:

        if ($line =~ qr|$owner\s+(\d+).*$processName|)

        The line above wound up being the winner.

        The line I was parsing is below, however it is abbreviated because the other JVM params and classpath show up after that. That's why the trailing $ didn't work. Don't know why the whitespace character before the processName didn't work.

        firtdev3 23052 0.3 0.3 5914496 419300 ? Sl 12:41 0:29 /opt/ +streets/vendorLib/java/linux/jdk1.7.0_45/bin/java -DprocessName=STP_I +NBOUND_INTERFACES.3
        I did run it. The regex doesn't match the output on a Red Hat system. I don't really have a problem with the expression.