more than one whitespace char as word anchor

blink has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse some output generated by someone else's script, which uses the module, User::Utmpx. What I'm trying to do is breakup each line into three saved buffers, as demonstrated below:

Here's the actual data:

 /var/adm/wtmpx begins on Fri Jun  7 16:40:21 2002 (488 days) ago
flexlm   FlexLM license manag Never logged in
m12345   Shmoe,Ronald K.  Last login Thu May  1 14:11:54 2003 (160 day
+s ago)
m54321   O'Schmoe,Karen D.     Last login Sun Sep 29 15:39:43 2002 (37
+4 days ago)
mresw3   Joe-Schmoe,Mira  Never logged in
mdsjlk   Schmoe, Robert L.    Last login Thu Sep 12 15:41:11 2002 (391
+ days ago)
oracle   OraSchmoe Oracle user   Never logged in
sshd     OpenSchmoe Privilege Se Never logged in
web      suitespot,httpd serv Never logged in
[download]

The save buffers should get populated like this (for example):

$1 = m54321
$2 = O'Schmoe,Karen D.
$3 = Last login Sun Sep 29 15:39:43 2002 (374 days ago)
[download]

The problem I'm running into is dealing with the variable whitespace.

For thos interested, her's what I've tried, none of which works, obviously

$line =~ /(\w+)\s+(.*?)\s{2,9}(.*?)/ ;

$line =~ s/(\w+)\s+(.*?)\s{2,9}(.*?)/$1 $2 $3/;
[download]

....and a few other broken permutations of the above.

If anyone is interested, the context of what I'm doing is, we've got a program which will return a list comparable to the data above. This returns a list of users who have not logged into a given server for N days.

My plan is to run this program on each server in our infrastructure and mail the output to an email alias. This alias is going to be the script that I'm asking for help on. My intent is for this script to parse the incoming data and create a report of which machines each user has "old" user account on, and finally concatenate it to a log file.

The resultant data structure would contain:

....and so on.

Edit by thelenm: fixed a broken code tag

Comment on more than one whitespace char as word anchor Select or Download Code

Replies are listed 'Best First'.
Re: more than one whitespace char as word anchor by delirium (Chaplain) on Oct 10, 2003 at 21:03 UTC
`$line =~ /(\w+)\s+(.?)\s{2,9}(.?)/ ;` `(.?)` will match 0 or more anything characters, and as few of them as possible. In other words, 0 characters. My approach to this problem would be to not worry so much about the spaces, but concentrate on the fact that "Last" and "Never" are possible delimiters to use. For example: `while (my $line=<>) { $line =~ /(\S+)\s+(.+?)\s((?:Last\|Never).+)/; print "$1::$2::$3\n"; }` [download] With your input, this produces: flexlm::FlexLM license manag::Never logged in m12345::Shmoe,Ronald K.::Last login Thu May 1 14:11:54 2003 (160 days + ago) m54321::O'Schmoe,Karen D.::Last login Sun Sep 29 15:39:43 2002 (374 da +ys ago m54321::O'Schmoe,Karen D.::Last login Sun Sep 29 15:39:43 2002 (374 da +ys ago mresw3::Joe-Schmoe,Mira::Never logged in mdsjlk::Schmoe, Robert L.::Last login Thu Sep 12 15:41:11 2002 (391 da +ys ago) oracle::OraSchmoe Oracle user::Never logged in sshd::OpenSchmoe Privilege Se::Never logged in web::suitespot,httpd serv::Never logged in [download] ..which appears to be split out the way you want.	[reply] [d/l] [select]
Re: more than one whitespace char as word anchor by liz (Monsignor) on Oct 10, 2003 at 20:57 UTC
Looking at your example, it looks like the fields are in fact tab delimited (\t). Have you tried splitting on "\t" ? Otherwise, it would seem to me that you need to match on known entities. Something like: `$line =~ m#^(\w+)\s+(.?)\s+((?:Never\|Last).)#` [download] Liz	[reply] [d/l]