Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Perl detect utf8, iso-8859-1 encoding

by swiftlet (Acolyte)
on Jul 25, 2020 at 09:35 UTC ( [id://11119792]=note: print w/replies, xml ) Need Help??


in reply to Perl detect utf8, iso-8859-1 encoding

I am afraid I do not have the luxury to discard all non-utf8 input, but I can simplify the code:

if the input is not detected as utf8, just treat it as iso-8859-1

use Text::Unaccent; use Encode::Detect::Detector; # my $author = "Sch%F6%E5ttl"; # my $author = "Sch%C3%A9ttl"; # my $author = "Sch%C3%B6ttl"; # my $author = "Sch%F6%F6ttl"; # my $author = "Sch%F6 %F4ttl"; my $author = "teoria elasticit%E0"; $author =~ s/%([a-zA-Z0-9][a-zA-Z0-9])/pack('C',hex($1))/eg; my $encoding = Encode::Detect::Detector::detect($author); if($encoding !~ m#utf-8#i){ $encoding = "iso-8859-1"; } if($encoding){ $author = unac_string($encoding, $author); print "after unac: $author<br>\n"; }

Seems like it's working better, any potential problem?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11119792]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-19 15:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found