in reply to build a distribution

Grig:

Sorry, your requirements aren't clear enough for a simple answer. First, I can't tell what your data file looks like because you didn't use code tags (<c>insert code here</c>). I can't tell if it's a single line of numbers, or a single number per line, or doubles, triples, ...

Second, you don't specify what distribution(s) you're interested in, nor how to partition your bins. I'm not even certain of whether you're trying to generate some fake data for testing, or process data in some way.

While I could make various guesses, it's doubtful that it would be helpful to you, and many monks here don't want to spend time on something that won't be of any use. Update your node a bit, clarify your question and requirements, and you should get some helpful results. It would be best if you try to code something up, and show us where you're having trouble. The more effort you put into your question, the better we can assist.

...roboticus

Replies are listed 'Best First'.
Re^2: build a distribution
by Grig (Novice) on Aug 07, 2010 at 14:11 UTC

    Dear roboticus,

    Thank you for your helpful remarks. I have inserted code tags.

    I'll try to clarify my task. I would like to separate the scored lengths into certain intervals. For example for the following data I would like to count the number of items that are less then 10, then the number of items between 10 and 20, 20 and 30, 30 and 40 and so on. Actually I need to build several distributions with different degree of detalisation. So the possible length of interval except 10 might be various 2, 6, 12 and and so on.
    3 3 5 7 8 8 12 13 15 16 20 25 34 34 31 38 40 40

    the actual output should be something like this:

    0-10 6 items 10-20 5 items 20-30 1 item 30-40 6 items
    Thank you once more.

      Grig:

      OK, then the way I'd approach the task would be something like this:

      my %bins; open my $INF, '<', $FileName or die $!; while (<$INF>) { chomp; $bins{get_bin($_)}++; } printf "%-6.6s %u items\n", $_, $bins{$_} for sort keys %bins; sub get_bin { # Determine the name of the bin to put the value into my $val = shift; my $bin_min = int($val / 10); my $bin_max = $bin_min + 10; return "$bin_min-$bin_max"; }

      You'll want to wrap in some error checking, testing, as well as any options you want...

      ...roboticus

        Your script is very nice. Unfortunately, I am not quite familiar with all those specific perl cryptography yet, but if the following data are used
        1 1 2 3 3 3 6 10 13 20 22 22 22 22 23 34 34 34 35 36 37 37 40 41 42 43
        your script gives the following output:
        0-10 7 items 1-11 2 items 2-12 6 items 3-13 7 items 4-14 4 items
        How it can be changed to get the number of items for the intervals listed:
        0-10 10-20 20-30 30-40

        ?

        Thank you for your help.