in reply to Re: Approach to efficiently choose a random line from a large file
in thread Approach to efficiently choose a random line from a large file

By picking a random byte position and then using the line that contains that byte as your pick, the bias for or against each line is proportional to it's length versus the average length of the lines in the file.
An easy way to visualize this is to imagine line 1 with 1 character, and line 2 with 99 characters. Ignoring line endings, you'll get the first line 1% of the time, and the second line 99% of the time.
  • Comment on Re^2: Approach to efficiently choose a random line from a large file