in reply to Re: What Is A Word?
in thread What Is A Word?
I already mentioned that in different words - do you think it needs to be more clear?
The seeker may want to define their own character class - perhaps to remove _ and 0-9 from \w but to add apostrophe and hyphen to match words like "don't" and "president-elect" and to not match words like "th_500X". You will need to point out that they are still going to match "Z-''-Z".Regarding the Unicode comment. I mentioned encoding and dealing with foreign languages in passing. The reason being is because the same pitfalls still happen. If you only want "real" words in some dictionary - properly handling unicode by itself is not going to fix the problem of "aaaaaa" and the grave accent counterpart failing.
In other words, I am saying that what the seeker may want is very subjective and only they can answer the questions necessary to provide an adequate solution. Forgetting to mention encoding will complicate the problem but knowing about it won't necessarily make the problem go away - the entire picture is required.
Thank you for your comment and the unicode link - a good tool indeed.
Cheers - L~R
|
|---|