Sorry, I thought my clues would be enough for you to work it out. I'll be clearer.
You have a regex that contains a number of capturing brackets. Each of those set of brackets will set an element in the list that is returned. As you've seen, any capturing brackets that don't match return undef.
So this code:
my @i ; # usern pid ? ? startt ? ? command $i[0] = "wwwrun 17275 10449 0 2006 ? 00:00:00 /usr/sb..."; $i[1] = "root 3826 1 0 Jan08 ? 00:00:00 su -" ; $i[2] = "root 3826 1 0 Jan 08 ? 00:00:00 su -" ; $i[3] = "root 3547 1 2 06:49 ? 00:11:56 zmd /us..."; $i[4] = "root 3547 1 2 06:49:12 pts/1 00:11:56 zmd /us..."; foreach ( @i ) { my @proc = /^ (\w+) # capture username \s+ (\d+) # capture PID \s+\d+\s+\d+\s+ (?: # cluster (not capturing) (\d{4}) # capture %Y | # or (\d{2}:\d{2}) # capture %H:%M | # or (\d{2}:\d{2}:\d{2}) # capture %H:%M:%S | # or (\w{3}\d{2}) # capture %b%d | # or (\w{3}\s+\d{2}) # capture %b %d ) \s+\S+\s+\S+\s+ # skip 2 columns after the 5th column (.*) # capture the command $/gx; print join ' | ', map { defined() ? $_ : 'undef' } @proc; print "\n"; }
Gives the following output:
wwwrun | 17275 | 2006 | undef | undef | undef | undef | /usr/sb... root | 3826 | undef | undef | undef | Jan08 | undef | su - root | 3826 | undef | undef | undef | undef | Jan 08 | su - root | 3547 | undef | 06:49 | undef | undef | undef | zmd /us... root | 3547 | undef | undef | 06:49:12 | undef | undef | zmd /us...
So your problem is that the datetime column can appear in a number of columns in your output depending on which part of the regex it matches.
Putting it even more simply, you have too many capturing brackets.
Why not remove all of the nested brackets that match the different types of datetime and replace your outer (non-capturing) brackets with one set of capturing brackets? That way, whichever regex is matched, it will always populate the same column in the output.
foreach ( @i ) { my @proc = /^ (\w+) # capture username \s+ (\d+) # capture PID \s+\d+\s+\d+\s+ ( # cluster (not capturing) \d{4} # capture %Y | # or \d{2}:\d{2} # capture %H:%M | # or \d{2}:\d{2}:\d{2} # capture %H:%M:%S | # or \w{3}\d{2} # capture %b%d | # or \w{3}\s+\d{2} # capture %b %d ) \s+\S+\s+\S+\s+ # skip 2 columns after the 5th column (.*) # capture the command $/gx; print join ' | ', @proc; print "\n"; }
Which produces the following output:
wwwrun | 17275 | 2006 | /usr/sb... root | 3826 | Jan08 | su - root | 3826 | Jan 08 | su - root | 3547 | 06:49 | zmd /us... root | 3547 | 06:49:12 | zmd /us...
With the datetime column always appearing in the same place.
See the Copyright notice on my home node.
"The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg
In reply to Re^2: parsing variable input (perlre problem)
by davorg
in thread parsing variable input (perlre problem)
by jeanluca
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |