Re: classifying data
by Abigail-II (Bishop) on Jan 19, 2004 at 16:32 UTC
|
Well, you make use of Regexp::Common. For instance, by doing:
use Regexp::Common;
$has_dollar_sign = $str =~ s/^([-+]?)\$/$1/;
if ($str =~ /^$RE{num}{decimal}{-sep => ','}{-keep}$/) {
$is_numeric = 1;
$sign = $2;
$has_decimal_point = $5 ? 1 : 0;
$has_commas = $str =~ /,/
}
Or you could take its regex and modify it to have an
optional leading dollar sign.
Abigail | [reply] [d/l] |
|
use warnings;
use strict;
while (<DATA>) {
chomp;
my $data = $_;
print "$data: ";
my $result = ($data =~ /^[+\-]?\$?[0-9.]+$/) ?
"numeric" : "non-numeric";
print "$result\n";
}
__DATA__
1020
$10.21
-1023
+1.024
beer
10$25
-$102.6
A1027
1028$
$
$-1.029
BTW, watch out for those asterisk opertors in your patters. You used \d*, and the '*' could legitimately match '', since it does match a digit zero or more times! That pesky asterisk operator can lead to 'zero-width' matches, which can drive you nuts when you are starting with regular expresssions.
Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
| [reply] [d/l] [select] |
|
Your regexp will not match 900,000, but it
will match .....
Abigail
| [reply] [d/l] [select] |
|
Re: classifying data
by flatline (Novice) on Jan 19, 2004 at 19:19 UTC
|
This got really complex, I can understand your confusion. Does this work for you? I've tested it as thoroughly as I can think to in a few minutes:
#!perl
use strict;
my @data = qw/ +20 400.00 $5,000.00 -$860.7 $-26.01 90,000,000 blah te
+st 45$99.0 $5+8.2 $,000 /;
for (@data) {
if (/^(\$|-|\+|\$-|\$\+|-\$|\+\$|\d)/) {
if (/^\d((,\d{3})|(\d*)|(\.\d{1,2}))+$/) {
print "$_ is a numeric!\n";
} elsif (/^(\$|-|\+|\$-|\$\+|-\$|\+\$)\d{1,3}((,\d{3})|(\d*)|(
+\.\d{1,2}))+$/) {
print "$_ is a dollar amount!\n";
} else {
print "$_ is non-numeric!\n";
}
}
}
| [reply] [d/l] |
Re: classifying data
by halley (Prior) on Jan 19, 2004 at 17:38 UTC
|
If you're just trying to come up with a numeric/non-numeric classification, one regex *should* be okay for most cases.
$value = undef;
$value = (0 + "$1$3")
if $thing =~ m/ ^
(\-|\+)? # optional sign: $1
(\$)? # optional dollar sign: $2
(
\d+ # at least one digit
(,\d\d\d)* # zero or more comma groups
(\.\d*)? # optional fractional part
|
(\.\d+) # only a fractional part
) # the whole mantissa: $3
$ /x;
print "numeric! value = $value\n" if defined $value;
I haven't tested this, but it should cover all the basic cases without scientific, but assumes commas are thousands-separators and the decimal point is the fraction separator. You might want to be lenient about leading and trailing spaces, or dollar-before-sign ($-34.00) cases.
-- [ e d @ h a l l e y . c c ]
| [reply] [d/l] |
|
This seems to accept 123456,123.00.
Be well.
| [reply] [d/l] |
|
Easily fixed, just give the first digit group an explicit count:
\d{1,3} # at least one digit
-- Spring: Forces, Coiled Again!
| [reply] [d/l] |
|
Yes, \d+(,\d{3})* will accept "123456,123.00". Perl accepts 123456_123.00 as one number, also. If you wish not to be so accepting, then you may have to deal with more than two choices in the key alternation. The suggested expression \d{1,3}(,\d{3})* would reject "123456123.00", since it lacks commas.
-- [ e d @ h a l l e y . c c ]
| [reply] [d/l] [select] |
Another method.
by grendelkhan (Sexton) on Jan 19, 2004 at 22:38 UTC
|
Here's my take on it, which I now notice looks decidedly similar to at least one other example. Without the possibility of commas as thousands-separators, it's about half the length it is here.
Also, note that the plus/minus must always precede the dollar sign (to fix this, make [+\-]?\$? into (([+\-]?\$?)|(\$?[+\-]?)). (Can those parens be eliminated somewhat? I don't know off the top of my head.)
#!/usr/bin/perl -w
# bunched up: /^[+\-]?\$?(\d+(\.\d+)?)(\d{1,3}(,\d\d\d)+(\.\d+)?)$/
$match = qr/^
[+\-]? # optional sign
\$? # optional dollar
( # version without commas
\d+ # one or more digits
(\.\d+)? # optional decimal-plus-more-digits
)
( # version with commas
\d{1,3} # one to three digits
(,\d\d\d)+ # groups of three
(\.\d+)? # optional decimal-plus-more-digits
)
$/x;
while (<DATA>) {
chomp;
print "$_: ".(($_ =~ $match) ? 'num' : 'non-num')."\n";
}
__DATA__
1020
$10.21
-1023
+1.024
beer
10$25
-$102.6
A1027
10,000
15,00,000
10,000.001
1028$
$
$-1.029
--grendelkhan | [reply] [d/l] [select] |
Re: classifying data
by dominix (Deacon) on Jan 20, 2004 at 10:26 UTC
|
| [reply] |
Re: classifying data
by Anonymous Monk on Jan 20, 2004 at 21:54 UTC
|
#!/usr/bin/perl -w
use strict;
my @data = qw/ this123isnot -$100 100_00 100$00 10000 +$100 $+100 this
+ is not /;
foreach (@data) {
if (/^(?:(?:[-+]?\$?)|(?:\$?[-+]))(?:\d{1,3}(?:,\d{3})*|\d+)(?:\.\
+d{1,2})?$/) {
print "$_ : numeric!\n";
} else {
print "$_ : non-numeric!\n";
}
}
Remember, if you don't need the $1, $2, etc... variables, use (?: ) which are faster since they don't save the result.
This looks for the begginning of line, then an optional - or + followed by or preceded by a $. Then \d{1,3}(?:,\d{3})* looks for 1-3 digits followed by a , followed by 3 digits OR \d+ one or more digits. Then for the decimal, we check for the optional \.\d{1,2}? then the end of line.
Also be careful, your original regex will match ^$ as well since everything is optional. | [reply] [d/l] |