in reply to Re^4: How to split unique patterns
in thread How to split unique patterns

Ah - if you want to capture different parts, you need to adjust the parentheses appropriately:

my $line= q{info::gmdate:2013-06-07 05:57:tccat_cico::r}; my @columns= qw( type tstype timestamp + info1 info2 rest); $line=~ /^(\w+)::(gmdate):(20\d\d-[01]\d-[0123]\d [012]\d:[0-6]\ +d):(\w+):(\w*):(.*)/ or die "Malformed input [$line] in line $."; my %info; @info{ @columns }= ($1,$2,$3,$4,$5,$6);

Update: Fixed $info{ @columns }= ... to be the correct @info{ @columns }= ...

Replies are listed 'Best First'.
Re^6: How to split unique patterns
by cornelius80 (Initiate) on Jun 10, 2013 at 09:59 UTC
    Hi Corion, Ohh MMmmyyy,...hmmm,..that really looks nasty,.. Ok,..let me have a look and try to decipher that,.. PS: it could really hasten the process if you could give a little explanation as you dissect each line, please? Kind Regards, Cornelius

      I'll comment on the regular expression:

      # 1 2 3 + 4 5 6 $line=~ /^(\w+)::(gmdate):(20\d\d-[01]\d-[0123]\d [012]\d:[0-6]\ +d):(\w+):(\w*):(.*)/

      The stuff parentheses fill in $1 to $6. This is commonly called "capturing parentheses", and documented in perlre.

      The first pair of parentheses captures a sequence of characters (\w+), like info.

      The second pair captures the literal string gmdate. We could have left the capture out, but maybe we want to expand the RE later to allow for other strings in that place.

      The third pair captures something that looks like a YYYY-mm-dd HH:MM timestamp, with some basic validation thrown in:

      1. 20\d\d matches four digits that start with 20. For timestamps, this is sensible as it is unlikely that you will have to process timestamps from 1999, or timestamps in 2100.
      2. -[01]\d matches a minus followed by the digits 0 or 1, followed by another digit. This will capture something that vaguely looks like a month number, allowing numbers from 00 to 19. This is not exactly a month, but close enough. Especially this will break if somebody puts in a YYYY-dd-mm timestamp.
      3. -[0123]\d matches a minus followed by the digits 0,1,2 or 3. This will match the day part of the date. It makes no validation as to the months, so the 30th February or 31st April will still match.
      4. [012]\d:[0-6]\d will match something that vaguely looks like HH:MM, with the hour between 00 and 29 and the minute between 00 and 69. I allow for the 60 minutes because I mistook it for seconds, and depending on the exact specifics of UTC, you can have timestamps with 60 or 61 seconds I believe. In any case, it's better be lenient here.

      If you can tell us where exactly you have problems with the regular expression, that will help us help you better.

        Hi Corion, Thank you so much for your patience on this. I really appreciate your explanation as it was very clear and precise. Kudos to you. Could I trouble you with just one more question,please. what does the following mean? @info{ @columns }= ($1,$2,$3,$4,$5,$6); Kind Regards, Cornelius

      ...give a little explanation as you dissect each line, please?

      Maybe you would like a little time with YAPE::Regex::Explain, a tool explaining regexes.

      It will be there for future reference and save some of Corion's time now.

      Cheers, Sören

      (hooked on the Perl Programming language)

        Hi Soren, Thank you...I will take a look now. Kind Regards, Cornelius