in reply to Re^2: a question about making a word frequency matrix
in thread a question about making a word frequency matrix

Quote:
I'd just like to spread information about internationalization for the Americian monk who thinks naïvely thinks other languages all use 8859_1 just a handful of accented letters.

Well, that was a pretty far leap. It's true, if you try to read files with this script that aren't in the encoding it expects, as it is written, you will almost certainly end up with wrong results. Perhaps I should have mentioned that. But in looking through my post, I can't find the spot where I say "This is the best and only way to do this, and it will deal all possible data sets without modification."

It was a 5 minute throw away script I just tossed off to give an idea of how the problem could be approached. Sorry I can't live up to the high standards of ambrus who naively believes that every quick and dirty one-off script should be perfect in every way and cover every eventuality.

  • Comment on Re^3: a question about making a word frequency matrix

Replies are listed 'Best First'.
Re^4: a question about making a word frequency matrix
by ambrus (Abbot) on Dec 08, 2005 at 17:03 UTC
    I'd just like to spread information about internationalization for the Americian monk who thinks naïvely thinks other languages all use 8859_1 just a handful of accented letters.

    I apologize. This was indeed a bit rude of me:

    I admit this encoding problem is just a minor nit, and that it's not central to the problem of the OP. There're just one reason why I mentioned it: you included accented characters to your examples.

    It was a 5 minute throw away script I just tossed off to give an idea of how the problem could be approached. Sorry I can't live up to the high standards of ambrus who naively believes that every quick and dirty one-off script should be perfect in every way and cover every eventuality.

    Very true. I often fall to this mistake.