I think it's time for a benchmark here:

Using perl 5.8.0, on Linux (Mandrake 9.0) on a rather fast machine (Athlon dual-processor 1.8):

#!/bin/perl -w use strict; use Benchmark( 'cmpthese'); use Encode; use Text::Iconv; use Unicode::Map8; use Unicode::String qw(utf8); use utf8; my $enc= 'latin1'; my $convert_iconv = Text::Iconv->new( 'utf8', $enc); my $convert_unicode = Unicode::Map8->new ($enc); my $text= <DATA>; chomp $text; # lets just check the output! print "Encode : ", encode("iso-8859-1", $text), "\n"; print "Text::Iconv : ", $convert_iconv->convert( $text), "\n"; print "Unicode::Map8 : ", $convert_unicode->to8 (utf8($text)->ucs2), " +\n"; print "regexp : ", latin1( $text), "\n"; # now benchmark cmpthese( 500000, { 'Encode' => sub { encode("iso-8859-1", $text); + }, 'Text::Iconv' => sub { $convert_iconv->convert( $text +); }, 'Unicode::Map8' => sub { $convert_unicode->to8 (utf8($t +ext)->ucs2); }, 'regexp' => sub { latin1( $text); + }, }); sub latin1 { my $text=shift; $text=~s{([\xc0-\xc3])(.)}{ my $hi = ord($1); my $lo = ord($2); chr((($hi & 0x03) <<6) | ($lo & 0x3F)) }ge; return $text; } __DATA__ texte soupçonné d'être plein de caractÚres accentués

Results:

Encode : texte soupçonné d'être plein de caractères accentués Text::Iconv : texte soupçonné d'être plein de caractères accentués Unicode::Map8 : texte soupçonné d'être plein de caractères accentués regexp : texte soupçonné d'être plein de caractères accentués Benchmark: timing 500000 iterations of Encode, Text::Iconv, Unicode::M +ap8, regexp... Encode: 6 wallclock secs ( 4.91 usr + 0.02 sys = 4.93 CPU) @ + 101419.88/s (n=500000) Text::Iconv: 2 wallclock secs ( 2.20 usr + 0.00 sys = 2.20 CPU) @ + 227272.73/s (n=500000) Unicode::Map8: 7 wallclock secs ( 7.66 usr + 0.00 sys = 7.66 CPU) @ + 65274.15/s (n=500000) regexp: 6 wallclock secs ( 5.65 usr + 0.01 sys = 5.66 CPU) @ + 88339.22/s (n=500000) Rate Unicode::Map8 regexp Encode Tex +t::Iconv Unicode::Map8 65274/s -- -26% -36% + -71% regexp 88339/s 35% -- -13% + -61% Encode 101420/s 55% 15% -- + -55% Text::Iconv 227273/s 248% 157% 124% + --

Note: I am not an expert in using Benchmark, so please let me know if my test is flawed.


In reply to Re: Re: Re: Re: Unicode and locales by mirod
in thread Unicode and locales by moxliukas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.