RobertJ has asked for the wisdom of the Perl Monks concerning the following question:

I have a small data base of the form

date TAB description TAB amount RETURN

The description can contain letters, numbers and various characters including . & - # etc

date is of format m/d/y and amount is of the format $dd.dd

I constructed the REGEX to strip out all but the amounts

[\t\/\d\w\ \-\#\*\&]*(\$[\d]*\.[\d]*)\r

and then save in a column

\1\r

to do this. It all works fine leaving only the amounts as a column. The problem occurs if there is a negative amount of the form -$dd.dd. I have been trying to use conditionals and alternations for about two hours with no luck. Would really like some enlightenment.

Thank you

I thought I posted this; however, can't see it. Excuse if I double posted.

For some reason I couldn't post this in Firefox, had to use Safari?

PROBLEM SOLVED

[\t\/\d\w\ \-\#\*\&]*\t(\-?)(\$[\d]*\.[\d]*)\r \1\2\r

Still can't figure out why I can't post with Firefox V9

Replies are listed 'Best First'.
Re: REGEX Frustration
by GrandFather (Saint) on Dec 22, 2011 at 21:14 UTC

    What you describe is a CSV (character separated value) file and there are a plethora of modules available to handle that format for you. A good starting point is Text::CSV. Using such a module your problem turns into:

    use strict; use warnings; use Text::CSV; my $dataStr = <<DATA; 1/1/1923\tFirst entry\t-23.45 2/2/1924\t"Second entry with a long multi-line description. This is parsed fine using Text::CSV, but I bet your regex chokes."\t23 +.23 DATA my $csv = Text::CSV->new({sep_char => "\t", binary => 1}); open my $dataIn, '<', \$dataStr; while (my $row = $csv->getline($dataIn)) { my ($date, $description, $value) = @$row; print "Value is $value\n"; } close $dataIn;

    Prints:

    Value is -23.45 Value is 23.23

    As an aside the regex magic you want is the ? modifier:

    use strict; use warnings; use Text::CSV; my $dataStr = <<DATA; 1/1/1923\tFirst entry\t-23.45 2/2/1924\t"Second entry with a long multi-line description. This is parsed fine using Text::CSV, but I bet your regex chokes."\t23 +.23 DATA open my $dataIn, '<', \$dataStr; while (my $line = <$dataIn>) { next if $line !~ /(-?\d*\.\d*)/; print "Value is $1\n"; } close $dataIn;

    Prints:

    Value is -23.45 Value is . Value is .

    which you will note still fails in nasty ways because the regex isn't near sufficient to extract the data you want. However, there are a huge number of errors in the regex you show so you absolutely must go read some of the regex documentation: perlretut, perlre and perlreref for a start.

    True laziness is hard work
Re: REGEX Frustration
by ikegami (Patriarch) on Dec 22, 2011 at 20:49 UTC
    Regex can be used for validation or data extraction. You want to perform the latter, so no need to examine the data that closely. You have tab-separated data, so all you need to match your data is:
    _____________________ date / ______________ desc / / _______ amount / / / ------ ------ ------ /^[^\t]*\t[^\t]*\t[^\n*]\n\z/

    You want to remove the amount, so

    s/^[^\t]*\t[^\t]*\t\K[^\n*]//; # 5.10+ s/^([^\t]*\t[^\t]*\t)[^\n*]/$1/; # Any version
Re: REGEX Frustration
by roboticus (Chancellor) on Dec 22, 2011 at 21:15 UTC

    RobertJ:

    You may simply be working too hard. If your format is just as described, and you don't have the TAB character anywhere in your data, you could probably just use something like (untested):

    while (my $line = <INFILE>) { my ($date, $description, $amount) = split /\t/, $line; if (defined $amount) { print OUTFILE $amount; } else { # Amount not found... } }

    If you're really wanting to use a regular expression, though, something like this might be more appropriate:

    while (my $line = <INFILE>) { if ($line =~ /(-?\$\d*\.\d*)/) { print OUTFILE $1, "\n"; } else { # Amount not found... } }

    In the regex, '-?' means "Maybe a minus sign". I left out the square brackets because they didn't really add anything just make a mess of things.

    ...roboticus

    Update: Corrected square bracket comment. (Kudos to GrandFather for the catch.)

    When your only tool is a hammer, all problems look like your thumb.

      Actually leaving out the square brackets is required! Not so much they didn't add anything as, they subtracted a lot! By now maybe the OP has read the regex docs and will realise that the square brackets denote a set of characters which matches just one character in the string being matched. It looks like the OP was thinking [] is equivalent to (). It's not.

      True laziness is hard work
Re: REGEX Frustration
by mr.nick (Chaplain) on Dec 22, 2011 at 21:08 UTC
    Since it's tab delimited, you could also use split to extract the amount field:

    my $amt = ((split(/\t/,$data))[2]);

    mr.nick ...

Re: REGEX Frustration
by JavaFan (Canon) on Dec 22, 2011 at 21:24 UTC
Re: REGEX Frustration
by TJPride (Pilgrim) on Dec 22, 2011 at 22:48 UTC
    This should match ok (assuming the tabs come through in my sample data, anyway). Note that you'll want to change the \n in the output to \r if you want \r.

    use strict; use warnings; while (<DATA>) { if (m/\t(-?\$\d+.\d+)$/) { print "$1\n"; } } __DATA__ 12-31-2000 description $44.44 12-31-2000 description -$44.44