in reply to Re: UTF8 versus \w in pattern matching (basic test)
in thread UTF8 versus \w in pattern matching
Thanks. That snippet works, as-is, but still the text I am getting does not. The data is fetched over HTTP from WordPress. If I save the file and run the 'file' utility, I get the output "HTML document, UTF-8 Unicode text" for everything. Yet, when I process the file with perl, the \w pattern misses non-ASCII letters.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: UTF8 versus \w in pattern matching (basic test)
by haj (Vicar) on Jul 06, 2021 at 12:25 UTC | |
|
Re^3: UTF8 versus \w in pattern matching (basic test)
by LanX (Saint) on Jul 06, 2021 at 12:56 UTC | |
by mldvx4 (Hermit) on Jul 06, 2021 at 13:03 UTC | |
by LanX (Saint) on Jul 06, 2021 at 15:23 UTC | |
by jo37 (Curate) on Jul 06, 2021 at 16:18 UTC | |
by haj (Vicar) on Jul 06, 2021 at 17:54 UTC | |
by jo37 (Curate) on Jul 06, 2021 at 18:03 UTC | |
by ikegami (Patriarch) on Jul 06, 2021 at 21:07 UTC |