in reply to Re: Unicode substitution regex conundrum
in thread Unicode substitution regex conundrum
For those who may be seeking the same wisdom, I would like to post the full solution in my case.
My entire "use" section:
For incoming form values:use CGI; use CGI::Carp qw(fatalsToBrowser); use strict; use DBI; use Encode; use Encode::HanConvert; #Module for dealing with CJK conversions use Encode qw(encode decode); use POSIX qw(locale_h); require Encode::CN; require Encode::TW; require 5.004;
For incoming values from the database:$name = decode("utf-8", $name); $value = decode("utf-8", $value);
And finally, an example of the regex which now functions on multiple languages (UTF8):my $quest = $dbh->prepare($statement, { RaiseError => 1 }) or die "Cannot prepare statement! $DBI::errstr\n"; while(@row = $quest->fetchrow_array()) { $c1=shift @row; $c1=decode("utf-8", $c1); ... }
Again, thank you very much! And thank you to Moritz who prodded me to learn how to code the regex substitution on multiple lines, with comments for readability. Blessings!$line =~ s%(?:\p{IsSpace}*) #Match zero or more spaces (\bNOT)? #Match zero or one "NOT" operator(s) (\(*) #Match zero or more left parentheses (\p{IsSpace}*|\s*|\b|^) #Match zero+ spaces or a word boundary (?!\") #Ensure this doesn't appear beforehand ((?:\p{IsWord}|\w|`| #Match zero+ words (?:\&\p{IsAlnum}*\;)*)* #Include HTML special chars, e.g. á (?:\.\{\d+\})* #Include zero+ MySQL-style wildcard '?'s (?:\[\.[^\.\]]*\.\])* #Include zero+ MySQL REGEXP chars (?:\[\:[^\:\]]*\:\])* #Include zero+ MySQL REGEXP special chars (?:\[[^\]]*\])* #Incl. zero+ MySQL REGEXP special patterns (?:\*(?!\"))* #Include zero+ stand-alone asterisks (?:\%(?!\"))* ) #Include zero+ stand-alone percent signs (?:\s*|\p{IsSpace}*) #Match zero+ spaces (\)*) #Match zero+ right parentheses (?:\p{IsSpace}|\s)* #Match zero+ spaces (["()]*) #Match zero+ double quotes or parentheses (?:\p{IsSpace}*) #Match zero+ spaces ( #(begin group) (?:NOT|OR|AND|XOR)* #Match zero+ operator words (?:\p{IsSpace}+|\(+|\p{IsZ}|\Z) #Then one+ spaces OR one+ ")" #OR end-of-string ) #(end group) #AND SUBSTITUTE THE ABOVE WITH THE BELOW %$2$3$table\.$columnName`$1`$like`"$l$wb$4$we$l"$5$6 $7 %xig;
~Polyglot~
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Unicode substitution regex conundrum
by Juerd (Abbot) on Mar 14, 2008 at 01:08 UTC |