isync has asked for the wisdom of the Perl Monks concerning the following question:
and it removed some letters, spaces and a lot more! Then my thought was it has to do with the string being in "internal format". So I tried:$internal_format_string =~ s/\n//g;
and it worked again! So it seems perl requires my string to be in utf8, at least to use recognize the special \n newline char. But doesn't this prevent me from properly handling the broad range of unicode characters in the regex, on other regexes than removing the \n char? So I tried to get back to full unicode processing in my regexes:require Encode; my $string_in_utf8 = Encode::encode_utf8($internal_format_string); $string_in_utf8 =~ s/\n//g;
Which failed (might be because I am using wrong syntax for hex operation) (or is the string not in hex but in unicode? \u{000A} failed as well..)$internal_format_string =~ s/\x{0A}//g;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: The unicode / utf8 struggle, part 2: regexes
by Joost (Canon) on May 17, 2007 at 12:03 UTC | |
|
Re: The unicode / utf8 struggle, part 2: regexes
by graff (Chancellor) on May 17, 2007 at 14:20 UTC | |
|
Re: The unicode / utf8 struggle, part 2: regexes
by isync (Hermit) on May 17, 2007 at 15:40 UTC | |
by graff (Chancellor) on May 17, 2007 at 18:59 UTC | |
|
Re: The unicode / utf8 struggle, part 2: regexes
by mattr (Curate) on May 22, 2007 at 09:41 UTC | |
|
Re: The unicode / utf8 struggle, part 2: regexes
by Juerd (Abbot) on Jun 13, 2007 at 19:22 UTC |