drose2211 has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to find the average of an array with this

#usr/bin/perl use strict; use warnings; use List::Util qw(sum); open IN, '<',"part4.csv" or die "can't open input file 'part4.csv': $! +\n"; my $x; my @windsordigits; foreach $x (<IN>){ @windsordigits = ($x =~ /WINDSOR\sRIVERSIDE.*(\d\d?)/); print "@windsordigits\n"; } my $average = sum(@windsordigits) / @windsordigits; print "$average"; close IN;

The issue I am having is that I am getting the "Illegal division by zero line 17, <IN> line 41. The regex only matches line 2-14. I am guessing this issue is occurring because the regex is reading all the way to line 41. Another issue I am having is that when I print out the values it is only printing the first digit and not the second. Am I on the right track here as far as the first issue? And if so is there a way to use the regex on a range of lines so that it doesn't run all the way to line 41. Here is a line from the file I am running it on.

CA006139520,"WINDSOR RIVERSIDE, ON CA",2018-01-02,10

Replies are listed 'Best First'.
Re: Illegal division by zero
by AnomalousMonk (Archbishop) on Jan 24, 2018 at 05:23 UTC

    One problem:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $x = 'CA006139520,\"WINDSOR RIVERSIDE, ON CA\",2018-01-02,10'; ;; my @windsordigits; @windsordigits = $x =~ /WINDSOR\sRIVERSIDE.*(\d\d?)/; dd \@windsordigits; " [0]
    When the last two digits of the line are '10', why is only '0' captured? The  .* "consumes" as much as possible of non-newline stuff (including digits), the  \d is required for a match, but the  \d? is not, so only one digit is captured. Try something like:
    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $x = 'CA006139520,\"WINDSOR RIVERSIDE, ON CA\",2018-01-02,10'; ;; my @windsordigits; @windsordigits = $x =~ /WINDSOR\sRIVERSIDE.*\b(\d+)\z/; dd \@windsordigits; " [10]
    to get all digits at the end of the line. (Update: If the line may end in an un-chomp-ed newline, use  \Z (big-Z) instead of  \z (little-z) as the end-of-line anchor.)

    Another problem:

    @windsordigits = ...;
    You're assigning a single item to the array on each pass through the while-loop; the array will never have more than a single item in it no matter how many lines you read. Try something like (untested):
    push @windsordigits, $x =~ /WINDSOR\sRIVERSIDE.*\b(\d+)\z/;
    (Update: E.g.:
    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @windsordigits; ;; for my $x ( 'CA006139520,\"WINDSOR RIVERSIDE, ON CA\",2018-01-02,10', qq{CA006139520,\"WINDSOR RIVERSIDE, ON CA\",2018-01-02,987\n}, qq{CA006139520,\"WINDSOR RIVERSIDE, ON CA\",2018-01-02,6\n}, ) { push @windsordigits, $x =~ /WINDSOR\sRIVERSIDE.*\b(\d+)\Z/; } dd \@windsordigits; " [10, 987, 6]
    Note use of  \Z anchor.)

    See the flip-flop operator  .. in perlop for help with the line-range problem. (Update: See use of range operator  .. "As a scalar operator ..." in perlop.)


    Give a man a fish:  <%-{-{-{-<

Re: Illegal division by zero
by haukex (Archbishop) on Jan 24, 2018 at 08:56 UTC

    I see a couple of issues with this code:

    • You're using a foreach where a while would be better. The former will read the entire file into memory and then iterate over the lines, while a while will read the file line-by-line.
    • On each iteration of the loop, you replace the entire contents of @windsordigits. If the last line read does not match the regex, @windsordigits will end up empty, which would explain your error. As AnomalousMonk already pointed out, you probably want to use push instead.
    • You're attempting to read what looks like a CSV file by hand. You would be much better off using Text::CSV (plus Text::CSV_XS for speed), as that will also handle the case of a comma inside quotes, which you've got there according to your sample data.
    use warnings; use strict; use Data::Dump; # Debug use Text::CSV; use List::Util qw/sum/; my $filename = 'part4.csv'; my $csv = Text::CSV->new({binary=>1, auto_diag=>2}); open my $fh, '<', $filename or die "$filename: $!"; my @windsordigits; while ( my $row = $csv->getline($fh) ) { dd $row; # Debug if ( $row->[1] =~ /WINDSOR\s+RIVERSIDE/ ) { push @windsordigits, $row->[3]; } } $csv->eof or $csv->error_diag; close $fh; dd @windsordigits; # Debug die "No digits" unless @windsordigits; my $average = sum(@windsordigits) / @windsordigits; dd $average; __END__ ["CA006139520", "WINDSOR RIVERSIDE, ON CA", "2018-01-02", 10] ["CA006139520", "FOO", "2018-01-02", 99] ["CA006139520", "WINDSOR RIVERSIDE ON CA", "2018-01-02", 24] ["CA006139520", "RIVERSIDE WINDSOR", "2018-01-02", 99] ["CA006139520", "WINDSOR RIVERSIDE, ON CA", "2018-01-02", 59] (10, 24, 59) 31
      use Text::CSV

      Well done sir, good idea

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!
Re: Illegal division by zero
by Random_Walk (Prior) on Jan 24, 2018 at 09:01 UTC

    You can also split the detection of WINDSOR RIVER from the extraction on the values. Here is my way ...

    #!usr/bin/perl use strict; use warnings; use List::Util qw(sum); my @windsordigits; while (my $x = <DATA>){ next unless $x =~ /WINDSOR\sRIVERSIDE/; push @windsordigits, +(split /,/, $x)[-1]; } die "No 'Windsor digits found in input\n" unless @windsordigits; my $average = sum(@windsordigits) / @windsordigits; print "Average is: $average\n"; __DATA__ CA006139520,"WINDSOR RIVERSIDE, ON CA",2018-01-02,10 CA006139520,"WINDSOR RIVERSIDE, ON CA",2018-01-02,20 CA006139520,"WINDSOR RIVERSIDE, ON CA",2018-01-02,14 CA006139520,"WINDSOR DRIVE, ON CA",2018-01-02,10

    I use a regex to match, then if we matched in the regex, split the line on the , character and take the last element or the resulting list using the [-1] offset.

    Update

    Added a die if no digits are detected in the input.

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!

      Just curious. If I had multiple names I am attempting to match. Say there is windsor riverside, new york, and philadelphia. Would I need to make additional loops or is it possible to push the digits for the corresponding names into different arrays or would I write that into the same while loop? EDIT: Answered my own question. Just added if statements in the while loop.

        drose2211:

        Sure thing: you can use a hash to hold an array of digit entries for each city you encounter, like this:

        $ cat pm1207805.pl #!usr/bin/perl use strict; use warnings; use Text::CSV; use List::Util qw(sum); use Data::Dumper; my $csv = Text::CSV->new(); my $FH = \*DATA; my %accumulator; # Gather the digits for the cities while (my $row = $csv->getline($FH)) { my $city = $row->[1]; my $digits = $row->[3]; push @{$accumulator{$city}}, $digits; } # What do we have to work with? print "Data:\n", Dumper(\%accumulator), "\n\n"; # Dump our results for my $city (sort keys %accumulator) { my $num_rows = @{$accumulator{$city}}; print "$city: ", sum(@{$accumulator{$city}}) / $num_rows, "\n"; } __DATA__ CA006139520,"WINDSOR RIVERSIDE, ON CA",2018-01-02,10 CA006139520,"WINDSOR RIVERSIDE, ON CA",2018-01-02,20 CA006139520,"WINDSOR RIVERSIDE, ON CA",2018-01-02,14 CA006138520,"NEW YORK",2018-01-02,11 CA006137520,"PHILADELPHIA, ON CA",2018-01-02,23 CA006137520,"PHILADELPHIA, ON CA",2018-01-02,25 CA006138520,"NEW YORK",2018-01-02,13 CA006138520,"NEW YORK",2018-01-02,19

        When you run it, you should get:

        $ perl pm1207805.pl Data: $VAR1 = { 'WINDSOR RIVERSIDE, ON CA' => [ '10', '20', '14' ], 'PHILADELPHIA, ON CA' => [ '23', '25' ], 'NEW YORK' => [ '11', '13', '19' ] }; NEW YORK: 14.3333333333333 PHILADELPHIA, ON CA: 24 WINDSOR RIVERSIDE, ON CA: 14.6666666666667

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        ... multiple names I am attempting to match. ... Just added if statements in the while loop.

        if-statement patches are probably ok for one-off or infrequent runs with a small, stable city-name list. For larger lists of cities or more frequent runs, I think I would go with a database.

        It's also possible to use a regex/hash approach:

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @cities = ('windsor riverside', ' new york ', 'philadelphia',); ;; my $rx_city = build_city_regex(@cities); print $rx_city; ;; my %city_digits; ;; RECORD: for my $record ( 'CA006139520,\"WINDSOR RIVERSIDE, ON CA \",2018-01-02,10', qq{CA006139520,\" NEW YORK , ON CA \",2018-01-02,987\n}, 'CA006139520,\"NEWYORK, ON CA \",2018-01-02,9999', 'CA006139520,\"NEW YORK, ON CA \",2018-01-02,10210', qq{CA006139520,\"PHILADELPHIA, ON CA \",2018-01-02,76\n}, ) { next RECORD unless my ($city, $digits) = $record =~ m{ ($rx_city) .* \b (\d+) \Z }xm +s; push @{ $city_digits{ canonicalize_city($city) } }, $digits } dd \%city_digits; ;; sub build_city_regex { my ($regex) = map qr{ \b (?: $_) \b }xms, join ' | ', map { (my $c = $_) =~ s{ \s+ }'\s+'xmsg; $c; } reverse sort map canonicalize_city($_), @_ ; return $regex; } ;; sub canonicalize_city { my ($city_name) = @_; ;; die qq{bad city: '$city_name'} if $city_name =~ m{ [^[:alpha:] -] }xms; $city_name =~ s{ \A \s+ | \s+ \z }''xmsg; $city_name =~ s{ \s+ }' 'xmsg; $city_name = uc $city_name; ;; return $city_name; } " (?msx-i: \b (?: WINDSOR\s+RIVERSIDE | PHILADELPHIA | NEW\s+YORK) \b ) { "NEW YORK" => [987, 10210], PHILADELPHIA => [76], "WINDSOR RIVERSIDE" => [10], }
        Something like this will work even with large lists (thousands!) of city names. However, as I said, for a sufficiently high size-frequency metric, it's probably better to use a database.


        Give a man a fish:  <%-{-{-{-<