northwestdev has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to convert a date string of the form: yyyy/mm/dd hh:mm:ss into a time (with the timelocal function). I am hoping someone has a regular expression that can help me extract the $hour, $min, $sec, $day, $month, $year needed by the tilelocal function

Replies are listed 'Best First'.
Re: Trouble with regular expression
by ig (Vicar) on Jun 08, 2009 at 05:55 UTC
Re: Trouble with regular expression
by CountZero (Bishop) on Jun 08, 2009 at 06:13 UTC
    Regexp::Common::time from the Regexp::Common family of modules, will assist you here.

    If you are absolutely, positively certain of the format of your input, then you can also try (assuming your input is in $_):

    ($year, $month, $day, $hour, $min, $sec) = m|\d{4})/(\d{2})/(\d{2}) (\ +d{2}):(\d{2}):(\d{2})|;

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Thanks. I think you're missing a ( after m| I played around with it a little bit. d+ instead of d{4}, d{2} also works.

        Your (\d+) works ... if you're willing to accept bad data -- for example, data with a one-, three- or (> four)-digit year. You'd do well to read perlretut and friends re quantifiers.

        Insisting on a regex is not always the best idea. That's implicit in CountZero's "absolutely, positively certain...." <Update> and in his excellent followup, written as I piddled about with this note.</update>

        And, even more to the point, insisting on a regex suggests that you failed to grok the replies to your earlier String (date) to time (integer).

        Indeed a missing "(": I blame it on my less than perfect "copy-and-paste"-fu!

        \d+ works, but as a general rule in a regex one should try to be as restrictive as possible, in order for the regex to pass only what is exactly right. For instance, \d+ allows something like "123456789" for any of the elements and that is surely wrong.

        By the same token, \d{2} isn't perfect either as it allows "99" for the number of months, days, hours, minutes or seconds.

        Regexp::Common::time has the following regex for the number of months:

        /(?:(?=[01])(?:0[1-9]|1[012]))/
        That regex does not allow "wrong" numbers for the month and there are similar regexen for days, hours, ...

        In the end it all comes down to how much you trust your input to do no funny things.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James