Nodonomy has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl use strict; use utf8; use Text::Unidecode; # I download text with embedded HTML numbers , as in # "advice to Gwenda to “let sleeping murder lie.”" # # 8220 is a decimal number for 'left double quotation mark'."; # # To convert this HTML number to a double quote, one can use # the Perl module Text::Unicode. It seems however that one # needs to use a hexadecimal number (in this case '201c') # for the unidecode() function. print "Function unidecode() prints the 'left double quotation mark': " +; print unidecode("\x{201c}") . "\n"; # I can get the hexadecimal number as follows: my $hexval = sprintf("%x", 8220); # MY ACTUAL QUESTION: how to get $hexval into the call to # unidecode()? unidecode("\x\{".$hexval."\}") does not do # the job, and I have tried half a dozen other ways that I # thought would work, but do not. # Does anyone know how to do this?

Replies are listed 'Best First'.
Re: Text:Unidecode question
by Anonyrnous Monk (Hermit) on Jan 28, 2011 at 14:29 UTC
    #!/usr/bin/perl -wl use strict; use HTML::Entities; use Text::Unidecode; my $text = "advice to Gwenda to “let sleeping murder lie.” +"; print unidecode( decode_entities($text) ); # advice to Gwenda to "le +t sleeping murder lie."

    (i.e., decode_entities() returns a Unicode string, which you then pass to unidecode() to transliterate to ASCII)

    Alternatively, with home-brewn decoding (--> chr is the function you were looking for):

    #!/usr/bin/perl -wl use strict; use Text::Unidecode; my $text = "advice to Gwenda to “let sleeping murder lie.” +"; $text =~ s/&#(\d+);/chr $1/ge; print unidecode($text); # advice to Gwenda to "le +t sleeping murder lie."
      Yes! The first solution is all that I have tried at this point; i.e., unidecode(decode_entities($text)). It works like a charm indeed. Thank you "Anonymous Monk" specifically and Perl Monks in general. What a nice solution, thanks to those who wrote the two modules HTML::Entities and Text::Unidecode. -Node from Nodonomy
Re: Text:Unidecode question
by Anonymous Monk on Jan 28, 2011 at 13:59 UTC
      I will try it out. Looks like it will reduce the work effort too, since one does not have to pull the HTML number out of the text, convert it, put it back, or anything like that.
        Doesn't work correctly. Result: advice to Gwenda to “let sleeping murder lie.” I'll look into this more, as it would be terrific if this would work as you suggested. Any ideas on your side?
Re: Text:Unidecode question
by Anonymous Monk on Feb 01, 2012 at 17:03 UTC
    Text::Unidecode has been actively unmaintained for over 10 years now. The author not only ignores bugreports and patches, but also rejects requests for comaintainership so nobody else can fix it.

      heh - "actively unmaintained". Love it.

        What's the difference between it and passive maintenance?