in reply to regex-matching the date

The regular expression you choose for any situation depends a great deal on how much you can trust your data to follow a pattern. Just for some examples (untested but should serve to illustrate):

So in summary, a regex will just match what you are telling it to look for (if present), which may very well not be a date. It may be wise to do a validation after the match, using something like Time::ParseDate, in which case you can choose a much simpler less-specific regex.

--
I'd like to be able to assign to an luser

Replies are listed 'Best First'.
Re: Re: regex-matching the date
by stuffy (Monk) on May 13, 2001 at 13:58 UTC
    correct me if I'm wrong, in order to use the split function, I need to use the regex in order to find where it is in the file, then place it into a variable, then split it? Even still, I think that is something I will use for formating purposes. I was able to get it working finally, I found I was making a newbie mistake, I was testing for the match, but I wasn't assigning it into a variable. My question now is on how I assigned it to a variable.
    if /(\w{3}\s+\w{3}\s+\d+)/){ $foo = $&; }

    how does this differ from using
    $foo = $1;

    If I am running through a long file, and $foo is changeing frequently, will one work better then the other?

    By the way, I like the last solution you used. The date will always be in the same format, and I am pretty sure that there will never be anything else with the same pattern, but then again never say never.

    thanks for all the help...I'm currently struggling through other regex problems But so far I have worked most of them out on my own which I prefer to do before asking the monks.

    Stuffy

      from perldoc perlre:
        WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program.
      In other words, you can use $& (i used to, im an ex-sed hacker), but it will slow things down and its kind of unmaintainable. $1 $2 $3 et cetera, are really shinier, happier codelets.

      brother dep.

      --
      Laziness, Impatience, Hubris, and Generosity.

      Brother dep has covered the downside of $&, but on your split question, the beauty there is that you don't need to match the (sometimes) complex target of your interest, just the separators that mark where your interest ends, and that's often a lot easier. In this case, if you're going to verify the date anyway, there is not much sense in going to great lengths to do that in the regex, so you can just split on whitespace instead.

      As you noted, split won't be able to find your dates at all. It is a great option if you are parsing some sort of log file in which the lines always start with that date format, but if you want to get that date out of the middle of a lot of other text, a specific regex would be my choice, and instead of split you can use $1 etc. to get your date components, like:

      if /(\w{3})\s+(\w{3})\s+(\d+)/){ ($day, $month, $daynum) = ($1, $2, $3); }
      Finally, while we're talking about OWTDI, you might also consider unpack for jobs like this as it is usually faster, though it is even more fussy about the format of the data being consistent. It is however ideal for fixed-width columns of data (anyone else still dealing with data in card images?).

      --
      I'd like to be able to assign to an luser