cmm7825 has asked for the wisdom of the Perl Monks concerning the following question:

Strictly from a speed perspective, is it better to be as precise as possible with the regex or broad, for example if I want to match the date "04/30/2010" would I do \d{2}\/\d{2}\/\d{4} or...... \S+? Thanks

Replies are listed 'Best First'.
Re: Regex: is it faster to be vague?
by jethro (Monsignor) on Apr 30, 2010 at 13:24 UTC
    Normally strict should be faster, because whenever you try to match something unmatchable the strict version will fail faster and therefore abort earlier.

    Whereas slightly faster parsing of the simpler regex is a one-time gain

Re: Regex: is it faster to be vague?
by ikegami (Patriarch) on Apr 30, 2010 at 16:08 UTC
Re: Regex: is it faster to be vague?
by johngg (Canon) on Apr 30, 2010 at 18:44 UTC

    Leaving speed aside, I'd choose a different regex delimiter to avoid "leaning toothpicks" syndrome and I'd use \d\d rather than \d{2} as it's shorter and easier to type.

    say q{Match!} if m{\d\d/\d\d/\d{4}};

    I hope this is of inteest.

    Cheers,

    JohnGG

Re: Regex: is it faster to be vague?
by moritz (Cardinal) on Apr 30, 2010 at 15:18 UTC
    Using 04/30/2010 should be fastest, because constant strings are being searched for with a very fast algorithm.
    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Regex: is it faster to be vague?
by JavaFan (Canon) on Apr 30, 2010 at 22:19 UTC
    If you want to match the date 04/30/2010 then neither \d{2}\/\d{2}\/\d{4} nor \S+ is correct. Both will match ٦٦/٦٦/٦٦٦٦, or 98/76/5432 for that matter. Neither of them is actually 04/30/2010. If you just don't care about false positives, returning 1 instead of doing a match is the fastest way.

    In general, being as precise as possible is the fastest, as that allows Perl to fail early. Always remember that when benchmarking matching: benchmark failures as well. But that's the general cause. There will be endless examples and cases with additional assumptions where vagueness wins. But those will be exceptions.

Re: Regex: is it faster to be vague?
by Marshall (Canon) on Apr 30, 2010 at 16:33 UTC
    If you want to match the exact date, then the fastest way is with a single string comparison. If you are sorting dates or looking for a close match (maybe binary search or whatever), a better format is: 2010-04-30. If you maintain 2 digits (with leading zeroes when necessary for month and day), this date format can be sorted as a single alpha string without having to break it into its component year-month-day numeric parts. The alpha sort will produce the correct less than, greater than or equal to result in a single cmp operation.