argv has asked for the wisdom of the Perl Monks concerning the following question:
At first, it seemed simple to just check for normal ascii, but then it occurs to me that I want to accept certain accented characters, like the é in café, and so on...
Before I go off writing some routine that checks for santiy in a string to see if it really is english text instead of arbitrary gobblygook, I figured maybe someone had such a thing. Even if I only look at the first N characters in a string, that'd be fine.
Again, the brute force intuitive step would be to just do something like
$string =~ /([\s\w]){25})/
but this seems like a hornet's nest of little gotchas where people have learned it ain't that simple.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: string? Or binary garbage?
by Albannach (Monsignor) on Dec 01, 2004 at 00:49 UTC | |
by cog (Parson) on Dec 01, 2004 at 09:30 UTC | |
|
Re: string? Or binary garbage?
by davido (Cardinal) on Dec 01, 2004 at 00:28 UTC | |
by argv (Pilgrim) on Dec 01, 2004 at 01:11 UTC | |
by argv (Pilgrim) on Dec 01, 2004 at 01:57 UTC | |
by hakkr (Chaplain) on Dec 01, 2004 at 15:55 UTC | |
|
Re: string? Or binary garbage?
by Anonymous Monk on Dec 02, 2004 at 13:06 UTC |