in reply to Re: Random sampling a variable record-length file.
in thread Random sampling a variable record-length file.

I follow your meaning, but I don't think it gells with the thoery of Normal distributions & Sampling

buckets of rivets rather than CSV files. :-)

I think that the distinction is that grabing a handful from a bucket of rivets does not imply any positional correlation between the elements of the sample--they tend to mix random(ish)ly as they fall into the bucket.

Machine tools tend to wear with use, so its pretty standard practice to set-up the machine tool to operate at one end of the tolorance, so that as the tool wears, it slowly drifts towards the other end. If you took a sample entirely from the beginning of the run--or the end--then the sample would not be representative--in terms of average/mean/mode/variance--of the entire run.

But grabbing a handful from the collection hopper where they will have tended to randomly mix should be representative.

Similarly, a contiguous sequence from the beginning, end or middle of a file is probabilistically less likely to be a representative sample, than one picked at random from the entire file.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"
  • Comment on Re^2: Random sampling a variable record-length file.