Sorry, I thought my clues would be enough for you to work it out. I'll be clearer.

You have a regex that contains a number of capturing brackets. Each of those set of brackets will set an element in the list that is returned. As you've seen, any capturing brackets that don't match return undef.

So this code:

my @i ; # usern pid ? ? startt ? ? command $i[0] = "wwwrun 17275 10449 0 2006 ? 00:00:00 /usr/sb..."; $i[1] = "root 3826 1 0 Jan08 ? 00:00:00 su -" ; $i[2] = "root 3826 1 0 Jan 08 ? 00:00:00 su -" ; $i[3] = "root 3547 1 2 06:49 ? 00:11:56 zmd /us..."; $i[4] = "root 3547 1 2 06:49:12 pts/1 00:11:56 zmd /us..."; foreach ( @i ) { my @proc = /^ (\w+) # capture username \s+ (\d+) # capture PID \s+\d+\s+\d+\s+ (?: # cluster (not capturing) (\d{4}) # capture %Y | # or (\d{2}:\d{2}) # capture %H:%M | # or (\d{2}:\d{2}:\d{2}) # capture %H:%M:%S | # or (\w{3}\d{2}) # capture %b%d | # or (\w{3}\s+\d{2}) # capture %b %d ) \s+\S+\s+\S+\s+ # skip 2 columns after the 5th column (.*) # capture the command $/gx; print join ' | ', map { defined() ? $_ : 'undef' } @proc; print "\n"; }

Gives the following output:

wwwrun | 17275 | 2006 | undef | undef | undef | undef | /usr/sb... root | 3826 | undef | undef | undef | Jan08 | undef | su - root | 3826 | undef | undef | undef | undef | Jan 08 | su - root | 3547 | undef | 06:49 | undef | undef | undef | zmd /us... root | 3547 | undef | undef | 06:49:12 | undef | undef | zmd /us...

So your problem is that the datetime column can appear in a number of columns in your output depending on which part of the regex it matches.

Putting it even more simply, you have too many capturing brackets.

Why not remove all of the nested brackets that match the different types of datetime and replace your outer (non-capturing) brackets with one set of capturing brackets? That way, whichever regex is matched, it will always populate the same column in the output.

foreach ( @i ) { my @proc = /^ (\w+) # capture username \s+ (\d+) # capture PID \s+\d+\s+\d+\s+ ( # cluster (not capturing) \d{4} # capture %Y | # or \d{2}:\d{2} # capture %H:%M | # or \d{2}:\d{2}:\d{2} # capture %H:%M:%S | # or \w{3}\d{2} # capture %b%d | # or \w{3}\s+\d{2} # capture %b %d ) \s+\S+\s+\S+\s+ # skip 2 columns after the 5th column (.*) # capture the command $/gx; print join ' | ', @proc; print "\n"; }

Which produces the following output:

wwwrun | 17275 | 2006 | /usr/sb... root | 3826 | Jan08 | su - root | 3826 | Jan 08 | su - root | 3547 | 06:49 | zmd /us... root | 3547 | 06:49:12 | zmd /us...

With the datetime column always appearing in the same place.

--

See the Copyright notice on my home node.

"The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg


In reply to Re^2: parsing variable input (perlre problem) by davorg
in thread parsing variable input (perlre problem) by jeanluca

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.