However, both regexes still don't appear to be hitting the spot. I've created a small test program which pulls an Arabic title from a webpage to demonstrate:
When run, I'd expect to see the result as "Contains non-ASCII", but instead I get "Contains ASCII only"use LWP::UserAgent; $ua = LWP::UserAgent->new; my $resp = $ua->get("http://www.englishlink.com/index_ARE_HTML.asp"); if ($resp->is_success) { $mystring = $resp->content; $mystring =~ s/.*\<title\>//sgi; $mystring =~ s/\<.*//sgi; } print "$mystring\n"; if ($mystring =~ m/[^\x00-\x7f]/) { print "Contains ASCII only\n"; } else { print "Contains non-ASCII\n"; }
Any thoughts as to why ?
In reply to Re^3: How to determine if string contains non-ASCII characters ?
by roadrunner
in thread How to determine if string contains non-ASCII characters ?
by roadrunner
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |