Re: Estimating Vocabulary

Well here's an alternate, a complete waste of cycles as it scales linearly with the number of words returned, OTOH it is not bounded by the size of the dictionary. (As is) It can also return duplicates, yada yada yada.

my(@lines, $line); open(FILE, shift) || die;
until( scalar @lines == $ARGV[0] ){
  seek(FILE, 0, $. = 0);
  rand($.) < 1 && ($line = $_) while <FILE>;
  push(@lines, $line);
}
print @lines, "wc -l could have told you this is $. words\n";
[download]

It's based on "How do I select a random line from a file?" in perlfaq5. I'd be interested in seeing if anybody else has a better means of extending this algorythm to report multiple entries.

-- perl -pe "s/\b;([st])/'\1/mg"

Comment on Re: Estimating Vocabulary Download Code

Replies are listed 'Best First'.
Re: Re: Estimating Vocabulary by I0 (Priest) on Mar 28, 2002 at 00:41 UTC
`my(@lines, $line); open(FILE, shift) \|\| die; 1 while <FILE>; $line=$.; seek(FILE, 0, $. = 0); rand($line-$.) < $ARGV[0]-@lines && push(@lines,$_) while <FILE>; print @lines, "wc -l could have told you this is $. words\n";` [download]	[reply] [d/l]
Re: Re: Re: Estimating Vocabulary by belg4mit (Prior) on Mar 28, 2002 at 01:00 UTC
UPDATE:Excellent! WAS: That does not appear to work, I ask for one line and get 13-18 lines... It is also heavily weighted towards the Zs `-- perl -pe "s/\b;([st])/'\1/mg"`	[reply]
Re: Re: Re: Re: Estimating Vocabulary by I0 (Priest) on Mar 28, 2002 at 01:39 UTC
Are you sure? It's working for me with only a small bias towards the Zs Update: Apparently, the observed bias was mostly an artifact of small sample size	[reply]
Re5: Estimating Vocabulary by belg4mit (Prior) on Mar 28, 2002 at 01:43 UTC
Re: Re5: Estimating Vocabulary by I0 (Priest) on Mar 28, 2002 at 02:12 UTC
Some notes below your chosen depth have not been shown here