dukea2006 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,
I'm struggling with something that I think is pretty simple, but I haven't been able to work it out and I'm hoping you all can lend a hand.
I have a Tab Delimited file similar to the following:

10 alpha 30 bravo 60 charlie 100 delta 500 echo 600 foxtrot 4 golf 22 hotel 900 igloo 800 juliet 999 kilo
What I need to be able to do is count those values in the first column that meet certain criteria such as >600,>1000, etc. So, using the data above, if I wanted to count those values that are > 600, I want a total result of "3".
I am able to read the file and parse the data with the following code but the count that I am using doesn't give me the total "3". Rather, it outputs:
1
1
1

Again, I think I am just missing something basic here - so, any suggestions would be appreciated.

#!/usr/bin/perl use warnings; use strict; #open and read, the file my $file = shift @ARGV; open (FILE1, "<", $file) or die "Can't open '$file': $!"; while (<FILE1>) { chomp $_; my($perfNum,$alphaVal) = split("\t", $_); #print "$perfNum\n"; my$gt600ms=0; if($perfNum > 600) { $gt600ms++; print "$gt600ms\n"; } } close (FILE1);

Replies are listed 'Best First'.
Re: Getting the right count
by flexvault (Monsignor) on Dec 30, 2011 at 20:22 UTC

    Move the initialize of $gt600ms to before you while and move the print to after the while and before or after the close, but not in the loop!

    Good Luck!

    "Well done is better than well said." - Benjamin Franklin

      Yep, that was it. Thank you for the quick response it's much appreciated!

Re: Getting the right count
by ww (Archbishop) on Dec 30, 2011 at 20:57 UTC
    The explanation above deals with your question; now, an unsolicited comment; not because it's better, but because it's a useful alternative in some cases:
    #!/usr/bin/perl use Modern::Perl; # 945665 my $count=0; my $num; my @arr = <DATA>; for (@arr) { $num=0; # nul $num to avoid probs from persistenc +e in $1 if ($_ =~ /(\d+)/ ) { # as an alt to splitting on the tab $num = $1; } if ($num > 600) { say "$num > 600ms"; # OP didn't ask for this, but /me # likes visual confirmation in the output $count++; } else { next; } } say "\t count: $count;" __DATA__ 10 alpha 30 bravo 60 charlie 100 delta 500 echo 600 foxtrot 4 golf 22 hotel 900 igloo 800 juliet 999 kilo

      I can see two potential disadvantages to this:

      • The file is needlessly loaded into memory all at once. Nothing to worry about with the files the author quoted, but why do it if a while() will do just as well?
      • If there is some kind of format violation like a blank line or one that doesn't start with a number, the variant using split will always spit out a warning while yours will silently use zero (which may be what the condition is checking for).
        mbethke:
        • The first (++) is, indeed a "potential" disadvantage... but it is easily remedied if the need arises. In any case, slurping the file is not the point; use of a regex to ID files is.
        • The second -- IMO -- is pretty much specious. Did you try inserting a "format violation" (or several)?

        The only decimal cases which effect the output that I've discovered are
        1) a record which consists solely of a number (clearly, a case which is cause for concern) or
        2) a number using a thousands separator. (Admitted: I followed the sample data, but, as you can see from the regex, numbers with 4 or more digits satisfy the test so long as any thousands separator is omitted. And if it's present, the regex needs minor modification, followed by a function to remove the offending punctuation.

        If the first is a case to worry about, spit out the entire record whenever a number satisfies the test. If the latter, the data is sufficiently suspect that its content should be validated... which is a different kettle of fish. So too is the case in which the numbers are binary or or octal or hex or lakhs or ....

        But, again, illustrating all that seemed OT to me; TIMTOWTDI is the point.

Re: Getting the right count
by Marshall (Canon) on Dec 30, 2011 at 21:26 UTC
    #!/usr/bin/perl use warnings; use strict; my $count=0; while (<DATA>) { # default split is /\s+/ which includes \t # white space characters are: \t\f\r\n and space my ($perfNum) = split; #first token of split $count++ if ($perfNum > 600); } print "count of perfNums > 600: $count\n"; #prints: count of perfNums > 600: 3 __DATA__ 10 alpha 30 bravo 60 charlie 100 delta 500 echo 600 foxtrot 4 golf 22 hotel 900 igloo 800 juliet 999 kilo
Re: Getting the right count
by JavaFan (Canon) on Dec 31, 2011 at 03:23 UTC
    $ perl -naF'\t' -E'$n+=$F[0]>600;END{say$n}' your-data
Re: Getting the right count
by TJPride (Pilgrim) on Dec 31, 2011 at 02:13 UTC
    use strict; use warnings; my $cutoff = 600; my $c = 0; while (<DATA>) { $c++ if (split /\t/)[0] > $cutoff; } print "$c over $cutoff\n"; __DATA__ 10 alpha 30 bravo 60 charlie 100 delta 500 echo 600 foxtrot 4 golf 22 hotel 900 igloo 800 juliet 999 kilo

    Or alternatively:

    $c++ if m/^(\d+)/ && $1 > $cutoff;