tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks. I'm doing quality assurance on a large number of free from text with pricing information.

Prices can have dot decimal with comma thousands separator, comma decimal with dot thousands separator, or no thousands separator at all. Maybe even no decimal. Since these are prices I'm expecting a currency, but sometimes this is missing so I'm not checking for that.

I'm cranking out a regex to do this, and tests to verify that everything works but... maybe a regex for this exists already? Something like Regexp::Common::Number::Promiscuous... ???

I didn't say anything in regexp common or googling around, but just in case I missed something, thought I would ask the monks.

Thanks if anyone can help :)

  • Comment on Promiscuously match what might be prices

Replies are listed 'Best First'.
Re: Promiscuously match what might be prices
by diotalevi (Canon) on Apr 10, 2006 at 14:29 UTC

    This is not difficult at all. You'll likely want to make a short test script testing various strings against your $promiscuous_proce with Test::More::like.

    use Regexp::Common; my $promiscuous_price = qr/ (?: $RE{num}{real} | $RE{num}{real}{-sep => ','} | ... ) /x

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      Thanks diotalevi. I think your suggestion will be fine. I just didn't read the manual carefully enough.

      I wound up with

      my $promiscuous_price = qr/ (?: $RE{num}{real} #matches 123456.78 | $RE{num}{real}{-radix => qr/[.]/}{-sep => qr/[,]/} #matches 123, +456.78 | $RE{num}{real}{-radix => qr/[,]/}{-sep => qr/[.]/} #matches 123. +456,78 ) /x;

        Unless you really need it, you should prefer saying \, and \. to [.] and [,]. You're disabling some important optimizations when you move literal text into a character class. Perl doesn't recognize that a single character class is equivalent to a literal character and this prevents its Boyer-Moore string matching from working effectively. That's one of the things that helps to make perl's regex so legendarilly fast.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊