in reply to Re^3: The Björk Situation
in thread The Björk Situation

Actually, now that I've had a moment to look at it, unidecode DOESN'T fare so well, strictly from a speed point of view.

You made the mistake of modifying $string directly so that in all but the first call, there are NO characters that need to be transliterated so it benchmarked much faster. Once that is fixed, it doesn't have such a big lead. (Actually, none at all ;-) )

unidecode => sub{ my $text = $string; return unidecode($text); },
Yields:

             Rate unidecode deaccent2  deaccent
unidecode  6797/s        --       -3%      -87%
deaccent2  6979/s        3%        --      -86%
deaccent  50687/s      646%      626%        --

Never-the-less, unidecode probably IS the best choice as it handles Unicode up to \xFFFF not just up to \xFF.

Replies are listed 'Best First'.
Re^5: The Björk Situation
by rhesa (Vicar) on Feb 16, 2006 at 00:30 UTC
    You made the mistake of modifying $string directly so that in all but the first call, there are NO characters that need to be transliterated so it is much faster. Once that is fixed, it doesn't have such a big lead.

    Whoops! You're right, I hadn't expected it to modify $string in-place. I suppose that's due to Benchmark imposing a void context on the return.

    My lesson learned today: Never trust your own benchmarks :)