Thanks. That snippet works, as-is, but still the text I am getting does not. The data is fetched over HTTP from WordPress. If I save the file and run the 'file' utility, I get the output "HTML document, UTF-8 Unicode text" for everything. Yet, when I process the file with perl, the \w pattern misses non-ASCII letters.
In reply to Re^2: UTF8 versus \w in pattern matching (basic test)
by mldvx4
in thread UTF8 versus \w in pattern matching
by mldvx4
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |