in reply to Re^7: UTF8 versus \w in pattern matching
in thread UTF8 versus \w in pattern matching
I've been trying for a real SSCCE. Here's one more try: When I fetch one of the source files using 'curl' directly to a file, and then import that file using Emacs, whittle it down to a few letters, like in the following, then I get the output $VAR1 = "t\x{f3}n";. That does not look like UTF-8 to me.
#!/usr/bin/perl use utf8; use Data::Dumper; use warnings; use strict; my $a = "tón"; print Dumper($a),qq(\n);
Is there a standard way to identify 8-bit, legacy text (which has been mislabeled upstream as UTF-8) and convert it into UTF-8 for continued work with regex?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^9: UTF8 versus \w in pattern matching
by haj (Vicar) on Jul 06, 2021 at 18:21 UTC | |
by pryrt (Abbot) on Jul 06, 2021 at 18:49 UTC | |
by ikegami (Patriarch) on Jul 06, 2021 at 21:01 UTC |