Plankton has asked for the wisdom of the Perl Monks concerning the following question:

Friends,

I am trying to write a regex for matching "duration" strings. Basically durations are of the form PnYnMnDTnHnMnS with some extra rules which are described in the link provided. Here's what I have so far.
#!/usr/bin/perl -w while(<DATA>) { chomp; if ( /^-{0,}P{1,1}T{0,}\d{1,}[Y|M|D|H|S]{1,1}/ ) { print "$_ matches\n"; } else { print "$_ NO match\n"; } } __DATA__ PnYnMnDTnHnMnS -P1Y P3Y3M3DT3H3M3S P3M3Y3DT3H3M3S P334Y3DT3H3M3S PT2H 5T PT P5M6DT P5M6DT9 P600T6S
And here's the output I get ...
PnYnMnDTnHnMnS NO match -P1Y matches P3Y3M3DT3H3M3S matches P3M3Y3DT3H3M3S matches P334Y3DT3H3M3S matches PT2H matches 5T NO match PT NO match P5M6DT matches P5M6DT9 matches P600T6S NO match
As you can see my regex is quite cutting it. P5M6DT, P3M3Y3DT3H3M3S, and P5M6DT9 should not match. And tiny brain is fried!

Plankton: 1% Evil, 99% Hot Gas.

Replies are listed 'Best First'.
Re: duration regex
by Abigail-II (Bishop) on Oct 07, 2003 at 08:29 UTC
    This is my reading of the spec:
    #!/usr/bin/perl use strict; use warnings; use Regexp::Common; my $dur = qr /-? # Optional leading minus. P # Required. (?=[T\d]) # Duration cannot be empty. (?:(?!-) $RE{num}{int} Y)? # Non-negative integer, Y ( +optional) (?:(?!-) $RE{num}{int} M)? # Non-negative integer, M ( +optional) (?:(?!-) $RE{num}{int} D)? # Non-negative integer, D ( +optional) (?:T (?=\d) # T, must be followed by a +digit. (?:(?!-) $RE{num}{int} H)? # Non-negative integer, H ( +optional) (?:(?!-) $RE{num}{int} M)? # Non-negative integer, M ( +optional) (?:(?!-) $RE{num}{decimal} S)? # Non-negative decimal, S ( +optional) )? # Entire T part is optional /x; while (<DATA>) { chomp; print "$_ ", /^$dur$/ ? "matches\n" : "does not match\n"; } __DATA__ PnYnMnDTnHnMnS -P1Y P3Y3M3DT3H3M3S P3M3Y3DT3H3M3S P334Y3DT3H3M3S PT2H 5T PT P5M6DT P5M6DT9 P600T6S P1347Y P1347M P1Y2MT2H P0Y1347M P0Y1347M0D P-1347M -P1347M P1Y2MT P1Y2M P PT0S PT0.1234S

    Running this results in:

    PnYnMnDTnHnMnS does not match -P1Y matches P3Y3M3DT3H3M3S matches P3M3Y3DT3H3M3S does not match P334Y3DT3H3M3S matches PT2H matches 5T does not match PT does not match P5M6DT does not match P5M6DT9 does not match P600T6S does not match P1347Y matches P1347M matches P1Y2MT2H matches P0Y1347M matches P0Y1347M0D matches P-1347M does not match -P1347M matches P1Y2MT does not match P1Y2M matches P does not match PT0S matches PT0.1234S matches

    Abigail

Re: duration regex
by Enlil (Parson) on Oct 07, 2003 at 07:18 UTC
    I took a look at the link and believe that it should do what you ask (though I did make a couple assumptions like that the order must always be: PnYnMnDTnHnMnS and that what they mean by the number of seconds can include decimal digits to an arbitray precision means (?:\d*\.\d+|\d+(?:\.\d+)?)?S ).
    use strict; use warnings; while (<DATA>) { chomp; my $hr = qr/\d+H/; my $min = qr/\d+M/; my $sec = qr/(?:\d*\.\d+|\d+(?:\.\d+)?)?S/; if ( m!^-?P(?:\d+Y)?(?:\d+M)?(?:\d+D)? (?:T (?:$hr(?:$min)?(?:$sec)? | $min(?:$sec)? | $sec ) )?$!x ) { print "$_ matches\n" } else { print "$_ does not match\n" } } __DATA__ PnYnMnDTnHnMnS -P1Y P3Y3M3DT3H3M3S P3M3Y3DT3H3M3S P334Y3DT3H3M3S PT2H 5T PT P5M6DT P5M6DT9 P600T6S __END__ PnYnMnDTnHnMnS does not match -P1Y matches P3Y3M3DT3H3M3S matches P3M3Y3DT3H3M3S does not match P334Y3DT3H3M3S matches PT2H matches 5T does not match PT does not match P5M6DT does not match P5M6DT9 does not match P600T6S does not match

    -enlil

Re: duration regex
by cLive ;-) (Prior) on Oct 07, 2003 at 06:02 UTC
    My head hurts too, but maybe a $ at the end of the regex wouldn't hurt :)

    Oh, and maybe a ? instead of {0,} - unless you want to match ----------

    .02

    cLive ;-)

Re: duration regex
by Roger (Parson) on Oct 07, 2003 at 06:59 UTC
    But how do you verify the order of appearance of Y followed by M followed by D followed by H followed by M followed by S (YMDHMS) with [Y|M|D|H|S]?

    It sounds like you need a little more than a regular expression to parse the duration string.

    How about a function, say, is_duration that checks the duration string passed in by first tokenize the string, and then validate the tokens by iterating through them. Return error when a bad token is encountered.

    You can have a look at parsers like the Parse::RecDescent for doing this sort of things, and it is not too hard to write such a validator either.

Re: duration regex
by BrowserUk (Patriarch) on Oct 07, 2003 at 08:47 UTC

    Update: Pleeaase don't vote++ for this! Having looked at Enlils and Abigails solutions, I realise that this is so over-specified and complicated that it is effectively garbage!

    The only saving grace is that I didn't post my first attempt. That was really complicated!

    I think this does the trick, though a few more testcases wouldn't go amiss.

    (Quickly) updated to correct my misreading of the spec. Date::Manip allows mixed negatives and positives in it's deltas, and I assumed this did too!

    Results


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

    Edited by castaway: closed bold and em tags on update line.