in reply to Re: build a distribution
in thread build a distribution

Dear roboticus,

Thank you for your helpful remarks. I have inserted code tags.

I'll try to clarify my task. I would like to separate the scored lengths into certain intervals. For example for the following data I would like to count the number of items that are less then 10, then the number of items between 10 and 20, 20 and 30, 30 and 40 and so on. Actually I need to build several distributions with different degree of detalisation. So the possible length of interval except 10 might be various 2, 6, 12 and and so on.
3 3 5 7 8 8 12 13 15 16 20 25 34 34 31 38 40 40

the actual output should be something like this:

0-10 6 items 10-20 5 items 20-30 1 item 30-40 6 items
Thank you once more.

Replies are listed 'Best First'.
Re^3: build a distribution
by roboticus (Chancellor) on Aug 07, 2010 at 14:28 UTC

    Grig:

    OK, then the way I'd approach the task would be something like this:

    my %bins; open my $INF, '<', $FileName or die $!; while (<$INF>) { chomp; $bins{get_bin($_)}++; } printf "%-6.6s %u items\n", $_, $bins{$_} for sort keys %bins; sub get_bin { # Determine the name of the bin to put the value into my $val = shift; my $bin_min = int($val / 10); my $bin_max = $bin_min + 10; return "$bin_min-$bin_max"; }

    You'll want to wrap in some error checking, testing, as well as any options you want...

    ...roboticus

      Your script is very nice. Unfortunately, I am not quite familiar with all those specific perl cryptography yet, but if the following data are used
      1 1 2 3 3 3 6 10 13 20 22 22 22 22 23 34 34 34 35 36 37 37 40 41 42 43
      your script gives the following output:
      0-10 7 items 1-11 2 items 2-12 6 items 3-13 7 items 4-14 4 items
      How it can be changed to get the number of items for the intervals listed:
      0-10 10-20 20-30 30-40

      ?

      Thank you for your help.

        Grig:

        Ah, I made a mistake in the subroutine. It was:

        sub get_bin { # Determine the name of the bin to put the value into my $val = shift; my $bin_min = int($val / 10); my $bin_max = $bin_min + 10; return "$bin_min-$bin_max"; }

        but it should have been (untested):

        sub get_bin { # Determine the name of the bin to put the value into my $val = shift; my $bin_min = 10*int($val / 10); my $bin_max = $bin_min + 10; return "$bin_min-$bin_max"; }

        I forgot the "10*" when computing $bin_min, so instead of 0, 10, 20, etc., it was using 0, 1, 2, ...

        ...roboticus

Re^3: build a distribution
by toolic (Bishop) on Aug 07, 2010 at 14:29 UTC