in reply to Re: Re: Re: Random entry from combined data set
in thread Random entry from combined data set

As you describe it, lines at the start of the file are more likely to be chosen than are lines at the end of the file. So the answer is no, the distribution of samples collected using the algorithm that you describe will be biased and not equal to the underlying population distribution.
  • Comment on Re: Re: Re: Re: Random entry from combined data set

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Random entry from combined data set
by John M. Dlugosz (Monsignor) on Jul 04, 2001 at 21:33 UTC
    That's what I'm thinking. Although induction seems to say it works out, I think the selection of the nth line will not equal all the previous choices combined.

    Update: It does. See code in my original reply to this thread. I've not run any statistics on the output, but it looks casually like it is uniform, not obviously biased toward one end or the other.

    —John