in reply to Re^3: a question about making a word frequency matrix
in thread a question about making a word frequency matrix

I'd just like to spread information about internationalization for the Americian monk who thinks naïvely thinks other languages all use 8859_1 just a handful of accented letters.

I apologize. This was indeed a bit rude of me:

I admit this encoding problem is just a minor nit, and that it's not central to the problem of the OP. There're just one reason why I mentioned it: you included accented characters to your examples.

It was a 5 minute throw away script I just tossed off to give an idea of how the problem could be approached. Sorry I can't live up to the high standards of ambrus who naively believes that every quick and dirty one-off script should be perfect in every way and cover every eventuality.

Very true. I often fall to this mistake.

  • Comment on Re^4: a question about making a word frequency matrix