blink has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse some output generated by someone else's script, which uses the module, User::Utmpx. What I'm trying to do is breakup each line into three saved buffers, as demonstrated below:

Here's the actual data:

/var/adm/wtmpx begins on Fri Jun 7 16:40:21 2002 (488 days) ago flexlm FlexLM license manag Never logged in m12345 Shmoe,Ronald K. Last login Thu May 1 14:11:54 2003 (160 day +s ago) m54321 O'Schmoe,Karen D. Last login Sun Sep 29 15:39:43 2002 (37 +4 days ago) mresw3 Joe-Schmoe,Mira Never logged in mdsjlk Schmoe, Robert L. Last login Thu Sep 12 15:41:11 2002 (391 + days ago) oracle OraSchmoe Oracle user Never logged in sshd OpenSchmoe Privilege Se Never logged in web suitespot,httpd serv Never logged in
The save buffers should get populated like this (for example):
$1 = m54321 $2 = O'Schmoe,Karen D. $3 = Last login Sun Sep 29 15:39:43 2002 (374 days ago)
The problem I'm running into is dealing with the variable whitespace.

For thos interested, her's what I've tried, none of which works, obviously

$line =~ /(\w+)\s+(.*?)\s{2,9}(.*?)/ ; $line =~ s/(\w+)\s+(.*?)\s{2,9}(.*?)/$1 $2 $3/;
....and a few other broken permutations of the above.

If anyone is interested, the context of what I'm doing is, we've got a program which will return a list comparable to the data above. This returns a list of users who have not logged into a given server for N days.

My plan is to run this program on each server in our infrastructure and mail the output to an email alias. This alias is going to be the script that I'm asking for help on. My intent is for this script to parse the incoming data and create a report of which machines each user has "old" user account on, and finally concatenate it to a log file.

The resultant data structure would contain:

<username>
<server 1> <last login>
<server 2> <last login>
<server 3> <last login>

....and so on.

Edit by thelenm: fixed a broken code tag

Replies are listed 'Best First'.
Re: more than one whitespace char as word anchor
by delirium (Chaplain) on Oct 10, 2003 at 21:03 UTC
        $line =~ /(\w+)\s+(.*?)\s{2,9}(.*?)/ ;

    (.*?) will match 0 or more anything characters, and as few of them as possible. In other words, 0 characters. My approach to this problem would be to not worry so much about the spaces, but concentrate on the fact that "Last" and "Never" are possible delimiters to use. For example:

    while (my $line=<>) { $line =~ /(\S+)\s+(.+?)\s*((?:Last|Never).+)/; print "$1::$2::$3\n"; }

    With your input, this produces:

    flexlm::FlexLM license manag::Never logged in m12345::Shmoe,Ronald K.::Last login Thu May 1 14:11:54 2003 (160 days + ago) m54321::O'Schmoe,Karen D.::Last login Sun Sep 29 15:39:43 2002 (374 da +ys ago m54321::O'Schmoe,Karen D.::Last login Sun Sep 29 15:39:43 2002 (374 da +ys ago mresw3::Joe-Schmoe,Mira::Never logged in mdsjlk::Schmoe, Robert L.::Last login Thu Sep 12 15:41:11 2002 (391 da +ys ago) oracle::OraSchmoe Oracle user::Never logged in sshd::OpenSchmoe Privilege Se::Never logged in web::suitespot,httpd serv::Never logged in

    ..which appears to be split out the way you want.

Re: more than one whitespace char as word anchor
by liz (Monsignor) on Oct 10, 2003 at 20:57 UTC
    Looking at your example, it looks like the fields are in fact tab delimited (\t). Have you tried splitting on "\t" ?

    Otherwise, it would seem to me that you need to match on known entities. Something like:

    $line =~ m#^(\w+)\s+(.*?)\s+((?:Never|Last).*)#

    Liz