chuckhazard has asked for the wisdom of the Perl Monks concerning the following question:

I am BOGGLED.

I have a file of check book data. Here's a sample line:

"Lunch Place", 1/22/2010, -4.110000, "Lunch", "", , 0

I want to run through the file, add up all the entries that say lunch and then see how much I spend a year on lunch.

# this is the format of the lines # "Description", "Date", "Amount", "Category", "Memo", "Check Number", + "Reconciled" my $file = "checkdata.csv"; open( F, "$file" ); my @lines = <F>; my $sum = 0; foreach my $line ( @lines ) { chomp( $line ); my @data = split( ",", $line ); my $v = $data[2]; my$vplus = $v + 1; # I tried doing this to take care of trailing zeros, leading space + and minus. my $data[2] =~ /\s*(-?\d*\.\d*)0000/; my $v2 = $1; print " $v plus 1 is $vplus\n"; print " $v2 is regexed\n"; $sum = $sum + $data[2]; } print $sum;

Here is a sample output line. -27.000000 plus 1 is 1 is regexed I am SO CLUELESS. I've used a lot of perl in the past and I can't for the life of me figure out whats going on and I'm going INSANE. Please, someone explain this to my addled brain before I use up all my brain matter on a brick wall.

Replies are listed 'Best First'.
Re: Why won't it please be a number?
by NetWallah (Canon) on Jan 26, 2010 at 06:25 UTC
    Works fine after you fix the

    syntax error at *** line 21, near "$data["

    by removing the "my" in front of "my $data[2] =~ /\s*(-?\d*\.\d*)0000/;"

    I get:

    -4.110000 plus 1 is -3.11 -4.11 is regexed
    after that correction, using your sample data.

    Also, please "use strict;", even for small programs.

         Theory is when you know something, but it doesn't work.
        Practice is when something works, but you don't know why it works.
        Programmers combine Theory and Practice: Nothing works and they don't know why.         -Anonymous

Re: Why won't it please be a number?
by ikegami (Patriarch) on Jan 26, 2010 at 05:57 UTC
    my creates a var, so my $data[2] =~ ... makes no sense. An array element isn't a variable that can be created that way, and matching against a new var is not gonna do much. Get rid of that my.
    use strict; use warnings; use Text::CSV_XS qw( ); my $file = "checkdata.csv"; my $csv = Text::CSV_XS->new({ Binary => 1 }) or die("Cannot create CSV object: ". Text::CSV->error_diag()); open(my $fh, '<', $file) or die("Can't open file \"$file\": $!\n"); my $sum; while (my $row = $csv->getline($fh)) { next if $row->[3] eq 'Lunch'; ( my $amt = $row->[2] ) =~ s/^\s+//; printf("%5.2f\n", $amt); $sum += $amt; } $csv->eof or die("Error parsing file: ", $csv->error_diag()); printf("Total: %5.2f\n", $amt);
Re: Why won't it please be a number?
by ikegami (Patriarch) on Jan 26, 2010 at 06:19 UTC

    -27.000000 plus 1 is 1

    No it's not. Not even if you left in the space.

    >perl -wle"print ' -27.000000'+1" -26
Re: Why won't it please be a number? (Tangential observations)
by ww (Archbishop) on Jan 26, 2010 at 18:43 UTC

    Other replies, above, offer fine answers about your emergent problem.

    This node argues for testing your statements and your data.

    Many, me included, find data-entry fraught with opportunities to 'screw up' (which is a technical term for 'fat fingering the data'; 'misreading the source data' (that hastily scrawled and only semi-legible entry in the checkbook); and so on. Hence, it's possible that what's in your digitized data (a spreadsheet saved as CSV, by the looks of things) might not match what's in the checkbook -- might be missing a minus sign or might have an extra digit appended.

    Some other those -- the two enumerated, for example -- can be caught without AI or requiring exceptional proof-reading. IMO, it's worth some added complexity:

    #!/usr/bin/perl use strict; use warnings; # 819634 my $file = "checkdata.csv"; # open( F, "$file" ); #replaced: see Note 1 open(F, '<', $file) or die "Can't open $file for read: $!"; my @lines = <F>; my $sum = 0; my ( $vplus, $v2 ); for my $line ( @lines ) { my $minus_sign = 0; my $fractional_pennies = 0; my @data = ''; chomp( $line ); @data = split( ",", $line ); if ( $data[2] =~ /\s*(-)(\d*\.\d{2})(\d?)/ ) # note 2 { $minus_sign = $1; $v2 = $2; $fractional_pennies = $3; if ( $fractional_pennies =~ /[^0]/ ) { warn "\n\t fractional pennies present in: " . $v2,$fra +ctional_pennies . "\n\t Ignoring the fraction.\n"; } } elsif ( $minus_sign ne "-" ) { warn "\n\t Missing minus sign in $data[2]\n\t Continuing as if + it were present"; if ( $data[2] =~ /\s*(-){0}(\d*\.\d{2})(\d?)/ ) # Note 3 { $v2= $2; } } print " \$v2 is: $v2\n"; $sum = $sum + $v2; } printf("Total: %5.2f\n", $sum);

    Executed against a datafile with this (partially erroneous) content

    "Lunch Place", 1/22/2010, -4.110000, "Lunch", "", , 0 "SomewhereElse", 1/23/2010, -17.40, "Beer", "12345", 0 "Lunch Place", 1/25/2010, -4.755, "Lunch", 0 # fractional p +ennies: highly suspect "Yet Another Joint", 1/24/2010, 13.240000, "Lunch", 0 # not a negati +ve amount: data entry error?

    the above produces this output:

    $v2 is: 4.11 $v2 is: 17.40 fractional pennies present in: 4.755 Ignoring the fraction. $v2 is: 4.75 Missing minus sign in 13.240000 Continuing as if it were present at F:\_wo\pl_test\819634.pl +line 34, <F> line 4. $v2 is: 13.24 Total: 39.50

    That differs by nearly 27 dollars (pesos, rubles, yuan...) from the "Total" you'd get were you NOT checking the sign of each entriy and (for bonus points) flags what may have been an inadvertent extra keystroke during the original data entry or may be a misplaced decimal, a more significant error.

    That error checking has only a small cost; writing it (once) into a script that you'll likely use at least a dozen times a year. The code presented may be inelegant; it certainly can be done other ways. The test for the minus sign -- just for example -- could be made moot by using abs $v to convert the entries to positive numbers (and the code presented could be far more elegant were I not tunnel-visioned onto the notion of testing; for example, at Note 3 "(-){0}" is basicly a NoOp and an excess of specificity which would be better omitted in production).

    The change flagged at 'Note 1' is "test the success of opens" -- a well established "best practice" (along, as others noted above, with use strict; use warnings;). And note that similar -- albeit more complex -- testing is undertaken at and below 'Note 2' with regard to the success of the regexen.

    Obviously, these won't catch errors such as transposition of digits, well-formatted but inaccurate entires and so on... but why not let your script catch what it can?

Re: Why won't it please be a number?
by pajout (Curate) on Jan 27, 2010 at 08:13 UTC
    I do not want to be aggressive or ironic, but:

    If you are BOGGLED and CLUELESS, use strict; use warnings;