in reply to Any easy way to do this?

So, you're expecting a user to come up with a set of threshold values to put on the command line? How likely is it, really, that a user will want to try a bunch of different variations of threshold values? (In fact, how likely is it that a user already knows what ranges of values are going to be useful?)

Any a priori assumptions that might provide sensible aid to reduce the user's "cognitive load" would be worth building into the script -- e.g. maybe threshold values should always be evenly spaced over an appropriate range, and users would just say how many thresholds (histogram bins) they want on a given run.

Regarding the code you posted, I'd offer a few "stylistic" points:

use Getopt::Long; # or Getopt::Std, which might be easier to grok.
That will make it easy to offer useful default values for things like number of bins, start-time and end-time. There could even be a default value for the name of the log file to read.

Perl gives a warning about line 59 -- it's harmless, but worth fixing.

When there's an "if" block that always ends with "exit 1" (which should just be "die"), there's no need for an "else" block after that (you can eliminate a layer of embedding). Likewise, you don't need an "else" block that contains just a next statement, given that there's nothing after that block in the enclosing loop.

Assuming you have an array of threshold values, you just need to make sure the array values are sorted, and loop over them to work out which bin a given value should be counted in -- here's a simple example that leaves aside all your other issues about selecting/excluding log entries:

my @thresh = ( 1000, 4000, 7000, 10000 ); my @bins; while (<LOG>) { my $val = ( split )[10]; next unless ( $val =~ /^\d+$/ ); my $i; for $i ( 0 .. $#thresh ) { last if ( $val < $thresh[$i] ); } $bins[$i]++; }
(UPDATED to give appropriate scope to $i -- thanks to wfsp for pointing that out.)

Geez! As GrandFather points out below, I really didn't get that right. Even after wfsp had told me it wouldn't work, I still had it wrong. What I should have suggested was something like this (thanks, GrandFather):

my @thresh = ( 1000, 4000, 7000, 10000 ); my @bins; while (<LOG>) { my $val = ( split )[10]; next unless ( $val =~ /^\d+$/ ); my $i = 0; while ( $i < @thresh and $val > $thresh[$i] ) { $i++; } $bins[$i]++; }

Replies are listed 'Best First'.
Re^2: Any easy way to do this?
by jb60606 (Acolyte) on Sep 15, 2011 at 23:23 UTC

    Thanks Graf, i'll give this a try tonight or tomorrow.

    Realistically, the user will likely specify thresholds between 1000 and 10,000 spaced by about 5000. So in all likelihood 1000, 5000 and 10000 are probably the only thresholds that this script will ever see, but I wanted to leave the user's options open.

    You're probably right, and I should just hard-code a range of thresholds to be run by default and maybe provide an override to run the script using a single user-defined threshold.

      Note the correction to my code snippet -- if $i were lexically scoped in the "for" statement (as originally posted), it would be unavailable after exiting that loop.

        That must have been an ENOCOFFEE error! Consider:

        use strict; use warnings; my $i = "Nothing to see here, move along\n"; for $i (1 .. 3) { print "$i "; } print $i;

        Prints:

        1 2 3 Nothing to see here, move along

        The for loop variable is aliased to the elements in the for list. Any global (to the loop) variable that happens to have the same name is unaffected by the loop and, in particular, does not end up with the last contents of the loop variable!

        True laziness is hard work