GertMT has asked for the wisdom of the Perl Monks concerning the following question:

The following code I've written to summarize a few data files of which one is given below. All works fine except that the data I've received very often has a last line with just a simple space character. This makes that the script runs an error because it's not able to split the last line!? Has anybody got a solution to this?
Thanks in advance,
Gert
#!/usr/bin/perl -w use strict; use warnings; use diagnostics; my %saldi; while (<>) { my @cellen = ( split /,/, )[ 3, 4 ]; $saldi{ $cellen[0] } += $cellen[1]; if ( eof(ARGV) ) { $ARGV =~ m/^(\S+)\.txt/; print "$1\n"; foreach my $name ( keys %saldi ) { print "\t$name\t$saldi{$name}\n"; } } }
Data
394,eur,2006,D,18.20 394,eur,2006,D,22 394,eur,2006,C,25

Replies are listed 'Best First'.
Re: Split pattern doesn't match last line of file
by bobf (Monsignor) on Jan 02, 2007 at 05:12 UTC

    I frequently use

    next if $line =~ m/^\s+$/;
    to skip lines that contain only white space. If you add a similar check (using $_ instead of $line, of course) to your while loop before the split statement it should solve the problem.

    Update: Depending on how variable the input data is, you could also add a check to ensure that @cellen contains the expected number of elements and that they are defined before you try to access them.

      ++bobf; I often used that line, but recently changed to using this form of it:

      next if not /\S/; # Skip blank lines
      I found that it was too easy to leave out (via typo) the ^,+, or $ in m/^\s+$/, and too hard to verify by eye.

      With my brain under the influence of Perl Best Practices, I see some possible bugs, and opportunities for improvements.

      • %saldi is not getting cleared between files! Is this what you want?
      • Magic <> is nice, but when you need eof() processing over multiple files, consider changing to do your own opens/closes, etc.
      • Sort your hash keys, or your output order may vary between runs.
      • Your totals will not line up when printed; consider printf() to format the output.
      • The code will be clearer if you name your fields instead of treating them as numbers, especially since the numbers change! (fields 3,4 become 0,1)
      • You are not chomping your input, so your last field contains newlines. It has not caused problems because the last field is only handled numerically.
      Working, tested example code:
      use strict; use warnings; foreach my $filename (@ARGV) { open my $fh, '<', $filename or warn "Can't open $filename: $!" and next; my %saldi; while ( <$fh> ) { chomp; next if not /\S/; # Skip blank lines my @fields = split /,/; if ( @fields != 5 ) { warn "Bad number of fields in file '$filename' line $."; next; } my ( $department, $currency, $year, $type, $amount ) = @fields +; $saldi{ $type } += $amount; } my ($basename) = ( $filename =~ m/^(\S+)\.txt/ ) or warn; print "$basename\n"; foreach my $type ( sort keys %saldi ) { my $total_amount = $saldi{$type}; printf "\t%-7s\t%7.2f\n", $type, $total_amount; } close $fh or warn "Can't close $filename: $!"; }

      guess it's a bit to advanced for me as I can't get it working though I understand the solution. Could you elaborate a bit more on where to put the
      next if $line =~ m/^\s+$/; and the $_?

        Sure - sorry for being too cryptic. Sometimes it's hard to tell how much detail is needed. :-)

        Insert a new line as follows (code snippet taken from your OP):

        while (<>) { next if m/^\s+$/; # <-- add this line my @cellen = ( split /,/, )[ 3, 4 ];

        The regex is anchored to the start (^) and end ($) of the string, and the pattern consists of one or more white space characters (\s+). See perlre and Pattern Matching, Regular Expressions, and Parsing (in the Tutorials section) for more information.

        Incidently, the code I added is a bit shorter than it could be, since it utilizes $_. A more verbose version could be:

        next if $_ =~ m/^\s+$/;
        or, if you wanted to explicitly assign the input text to a variable before processing, you could do something like this:
        while ( my $line = <>) { next if $line =~ m/^\s+$/; # <-- add this line my @cellen = ( split( /,/, $line ) )[ 3, 4 ];

        HTH

Re: Split pattern doesn't match last line of file
by SheridanCat (Pilgrim) on Jan 02, 2007 at 05:37 UTC
    There are a couple things going on that will make this all a bit bettr, I think.

    First off, using the "-w" switch and "use warnings" is redundant. That's just a minor nit.

    It would be common to do the enumeration of the %saldi hash outside of the while() loop. The while loop will exit when there is no further data to process. At that point you'll know you've reached EOF, so there's no need to test for it as you are there.

    It's also a good practice to test for the validity of the incoming data before you try to do anything significant with it. So, a check of the incoming data along with a check of the @cellen array after the split will keep you from trying to do things with bogus data. Here's just a quick example of what I'm talking about:

    #!/usr/bin/perl use strict; use warnings; use diagnostics; my %saldi; while (<>) { chomp; next unless $_; my @cellen = ( split /,/, )[ 3, 4 ]; next unless $cellen[0] && $cellen[1]; $saldi{ $cellen[0] } += $cellen[1]; } $ARGV =~ m/^(\S+)\.txt/; print "$1\n"; foreach my $name ( keys %saldi ) { print "\t$name\t$saldi{$name}\n"; }
      SheridanCat, Util and bobf, all proposed code is very helpful and works! Additionally I get a much better understanding of what's actually going on if the code runs.
      thanks again
Re: Split pattern doesn't match last line of file
by graff (Chancellor) on Jan 02, 2007 at 05:05 UTC
    How about this:
    ... while (<>) { next unless (/,/); ...
    In other words, simply skip lines that don't contain a comma. Everything else could stay the same.
      couldn't get this to work. I'll investigate further
        It shouldn't be that much trouble. If the OP script works most of the time (except when there's a file with a blank line), then just adding the "next" statement as shown (first line of code inside the while loop) should work all the time:
        #!/usr/bin/perl -w use strict; use warnings; use diagnostics; my %saldi; while (<>) { next unless ( /,/ ); ## add this line my @cellen = ( split /,/, )[ 3, 4 ]; $saldi{ $cellen[0] } += $cellen[1]; if ( eof(ARGV) ) { $ARGV =~ m/^(\S+)\.txt/; print "$1\n"; foreach my $name ( keys %saldi ) { print "\t$name\t$saldi{$name}\n"; } } }