in reply to Re: Re: Estimating Vocabulary
in thread Estimating Vocabulary

UPDATE:Excellent!

WAS: That does not appear to work, I ask for one line and get 13-18 lines... It is also heavily weighted towards the Zs

--
perl -pe "s/\b;([st])/'\1/mg"

Replies are listed 'Best First'.
Re: Re: Re: Re: Estimating Vocabulary
by I0 (Priest) on Mar 28, 2002 at 01:39 UTC
    Are you sure? It's working for me with only a small bias towards the Zs

    Update: Apparently, the observed bias was mostly an artifact of small sample size
      On a linux 2.4.9 box running perl 5.6.0 perl /tmp/a /usr/share/dict/words 1 yields such things as:

      databases fritter Saracens stammerer when Whitmanize willing writes Wuhan youthfully zigzag Zoroaster Zulu Zulus Zurich wc -l could have told you this is 45424 words
      shrivel topologies wetter Wilkins wristwatch Yeager yellowed zoom zooms Zoroastrian Zulu Zulus Zurich wc -l could have told you this is 45424 words
      valuably wins wriggles Zennist zoning Zoroaster Zulu Zulus Zurich wc -l could have told you this is 45424 words
      knave requisitioning seismology sentimentally tail Telnet Welles Whipple winner workbooks workmen Yates yeas Yokohama zodiac zonally zone Zoroaster Zoroastrian Zulu Zulus Zurich wc -l could have told you this is 45424 words

      --
      perl -pe "s/\b;([st])/'\1/mg"

        I can't duplicate your problem. It should be impossible since $ARGV[0]-@lines would be negative while rand($line-$.) is positive.