Cupojava has asked for the wisdom of the Perl Monks concerning the following question:

A little help.... I need to get each word from following string into a variable and I know a Regular Expression would work but I having problem coding it. "Fri Nov 8 15:00:02 2002:" $1 = Fri $2 = Nov $3 = 8 $4 = 15:00:02 $5 = 2002

Replies are listed 'Best First'.
Re: Regular Exp parsing
by tadman (Prior) on Dec 13, 2002 at 18:14 UTC
    Since your data is delimited by spaces, you can really simplify this using split:
    chop($date_string); # Remove colon my @date = split(' ', $date_string);
    Now $date[0] is 'Fri', $date[1] is 'Nov' and so forth. If you're weary of using chop, you can always use substr instead.
Re: Regular Exp parsing
by MarkM (Curate) on Dec 13, 2002 at 17:42 UTC

    The easiest way to parse string such as this is:

    my($wday, $mon, $mday, $time, $year) = $var =~ /\A(\S+)\s+(\S+)\s+(\S+)\s+(\d+):/;

    Broken down -- it scans var for a regexp that matches a specific pattern, returning parameters (the things in '()') as a list.

    If this is done in a loop, tag 'or next' on the end to make it skip matches that do not apply.

    NOTE: Updated in response to Cupojava's observation.

      Just for good marks and all.. when you are dealing with a regex expression that will match a whole line it is good form to use ^(matches the beginning of a line) and $(matches the end of a line) to speed processing. So your regex expression would change to:

      /^\A(\S+) (\S+) (\S+) (\d+):$/;

      Also, if you truly wanted $1 to be set you could do so by just executing the regex statement provided by MarkM.

      So instead of:

      my($wday, $mon, $mday, $time, $year) = $var =~ /\A(\S+) (\S+) (\S+) (\d+):/;
      it would just be

      $var =~ /\A(\S+) (\S+) (\S+) (\d+):/;
      and $1 would be set equal to the first match

      $2 to second

      and so on

      These are static/constant variables so to modify them you would have to assign them to a seperate variable as MarkM has done. If however you just need to display or store the results why generate additional variables?

      If you are doing this over a large number of entries you also might want to look into optimizing your statements using the lookaheads (I think that is the correct term) which allow the regex expression to set a qualifier before attempting to match any further into the string/line/block, etc..

      example

      (?:[SMTWF]) warning I know my syntax is off so please don't use this.
      at the beginning of your regex string should help to quickly skip those lines which do not start with a capital letter from the days of the week, nifty huh?

      Just my .02 cents since I love regex.

      Dave -- Saving the world one node at a time

        Zapowork: \A..\z is just as efficient as ^..$

        \A..\z should be used to anchor a pure string, wheras ^..$ should be used to anchor a line. For most cases, the difference is subtle enough that, virtually, there is no difference (this is why cookbook examples, and a lot of existing code is able to get away with never using \A..\z). Still, it is proper to be accurate. If it is not expected, or acceptable for a string to end with '\n', \z should be used instead of $.

        For example:

        if ($ARGV[0] =~ /^-o$/) { ... }

        Will match "-o" or "-o\n". For command line arguments, "-o\n" should not be allowed. The more accurate expression is:

        if ($ARGV[0] =~ /\A-o\z/) { ... }

        The reason I am so rigid about this point is that I have been hit by the difference in production code. I am now very strict about use \A..\z for strings and ^..$ for lines.

      I see what you are doing but it isn't going to work because the string mday sometimes is a single digit and sometimes a double causing the spaces between them to differ....
Re: Regular Exp parsing
by senik148 (Beadle) on Dec 13, 2002 at 18:43 UTC
    or this..
    $string = "Fri Nov 8 15:00:02 2002:"; my ($1, $2, $3, $4, $5) = split(/ /, $string);
      You can't use $1, $2, ... in my(). It would be far better to avoid using $1, $2, ... altogether and use properly named variables that do not have any scoping issues.
      or this..

      $string = "Fri Nov 8 15:00:02 2002:"; my ($1, $2, $3, $4, $5) = split(/ +/, $string);

      Noting that the + allows for one or more spaces.

        Oops, corrected as the use of $1, $2, etc. wouldn't work as pointed out above since they are "magic" variables used by regexps.

        $string = "Fri Nov 8 15:00:02 2002:"; my ($a, $b, $c, $d, $e) = split(/ +/, $string);

        Noting that the + allows for one or more spaces.