in reply to reading/writing to a file

Above solutions look good, but just to mention another tool in the toolbox: If this is on *nix, keep on mind the sort and uniq commands. For example, your perl could just create a raw dictionary file, not worrying about duplicates (and thus eliminating the need for a possibly very large in-memory hash), and then just invoke sometihng like:
system("sort raw_outfile | uniq > real_outfile"); unlink "raw_outfile";
Not sure if it's the best use here, but in general sort/uniq on the cmdline is very useful.

Replies are listed 'Best First'.
Re^2: reading/writing to a file
by tlm (Prior) on Jun 18, 2005 at 20:14 UTC

    I agree that using Unix utilities is a good alternative for this problem, but note that

    • For the purpose of this problem at least (and, AFAIK, always),
      sort foo.txt | uniq
      can be replaced with a single sort command:
      sort -u foo.txt
    • By itself, sort -u (or sort ... | uniq) is not enough to solve this problem. Something like GNU's comm is also required (zsh, YMMV):
      % (comm -2 -3 sorted_new.txt sorted_exclude.txt; < dict.txt) | sort -u + \ > tmp % mv tmp dict.txt
      (BTW, if anyone knows how to avoid the temporary file above, I'd love to hear about it.)

    the lowliest monk