in reply to Regular Exp parsing

The easiest way to parse string such as this is:

my($wday, $mon, $mday, $time, $year) = $var =~ /\A(\S+)\s+(\S+)\s+(\S+)\s+(\d+):/;

Broken down -- it scans var for a regexp that matches a specific pattern, returning parameters (the things in '()') as a list.

If this is done in a loop, tag 'or next' on the end to make it skip matches that do not apply.

NOTE: Updated in response to Cupojava's observation.

Replies are listed 'Best First'.
Re: Re: Regular Exp parsing
by Zapawork (Scribe) on Dec 13, 2002 at 18:58 UTC
    Just for good marks and all.. when you are dealing with a regex expression that will match a whole line it is good form to use ^(matches the beginning of a line) and $(matches the end of a line) to speed processing. So your regex expression would change to:

    /^\A(\S+) (\S+) (\S+) (\d+):$/;

    Also, if you truly wanted $1 to be set you could do so by just executing the regex statement provided by MarkM.

    So instead of:

    my($wday, $mon, $mday, $time, $year) = $var =~ /\A(\S+) (\S+) (\S+) (\d+):/;
    it would just be

    $var =~ /\A(\S+) (\S+) (\S+) (\d+):/;
    and $1 would be set equal to the first match

    $2 to second

    and so on

    These are static/constant variables so to modify them you would have to assign them to a seperate variable as MarkM has done. If however you just need to display or store the results why generate additional variables?

    If you are doing this over a large number of entries you also might want to look into optimizing your statements using the lookaheads (I think that is the correct term) which allow the regex expression to set a qualifier before attempting to match any further into the string/line/block, etc..

    example

    (?:[SMTWF]) warning I know my syntax is off so please don't use this.
    at the beginning of your regex string should help to quickly skip those lines which do not start with a capital letter from the days of the week, nifty huh?

    Just my .02 cents since I love regex.

    Dave -- Saving the world one node at a time

      Zapowork: \A..\z is just as efficient as ^..$

      \A..\z should be used to anchor a pure string, wheras ^..$ should be used to anchor a line. For most cases, the difference is subtle enough that, virtually, there is no difference (this is why cookbook examples, and a lot of existing code is able to get away with never using \A..\z). Still, it is proper to be accurate. If it is not expected, or acceptable for a string to end with '\n', \z should be used instead of $.

      For example:

      if ($ARGV[0] =~ /^-o$/) { ... }

      Will match "-o" or "-o\n". For command line arguments, "-o\n" should not be allowed. The more accurate expression is:

      if ($ARGV[0] =~ /\A-o\z/) { ... }

      The reason I am so rigid about this point is that I have been hit by the difference in production code. I am now very strict about use \A..\z for strings and ^..$ for lines.

        Hi Mark,

        That's great information. I didn't know that \n would not be literraly matched when using $ as an anchor. I normally chomp all my strings before they get to that point so I hadn't encountered it. Knowing this now though is there a reason as to why? Does $ assume EOL characters?

        BTW - Did you mean to put a \z in your initial example?

        Dave -- Saving the world one node at a time

Re: Re: Regular Exp parsing
by Cupojava (Novice) on Dec 13, 2002 at 21:05 UTC
    I see what you are doing but it isn't going to work because the string mday sometimes is a single digit and sometimes a double causing the spaces between them to differ....