in reply to Re: The Björk Situation
in thread The Björk Situation
You can speed this up considerably by transliterating everything you can and then only substituting characters that need it.
my $string = 'ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûüÝÿýÆ +æÞþÐðß'; print deaccent($string); sub deaccent{ my $phrase = shift; return $phrase unless ($phrase =~ m/[\xC0-\xFF]/); $phrase =~ tr/ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûü +Ýÿý/AAAAAAaaaaaaCcEEEEeeeeIIIIiiiiOOOOOOooooooNnUUUUuuuuYyy/; my %trans = ( 'Æ' => 'AE', 'æ' => 'ae', 'Þ' => 'TH', 'þ' => 'th', 'Ð' => 'TH', 'ð' => 'th', 'ß' => 'ss' ); $phrase =~ s/([ÆæÞþÐðß])/$trans{$1}/g; return $phrase; }
Benchmarking puts it at about 6 times the speed. Moving the hash assignment outside the sub speeds both up about the same amount, they stay about 6:1 ratio.
use Benchmark qw( cmpthese ); my $string = 'ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûüÝÿýÆ +æÞþÐðß'; cmpthese( -5, { deaccent => sub { my $phrase = $string; return $phrase unless ($phrase =~ m/[\xC0-\xFF]/); $phrase =~ tr/ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûü +Ýÿý/AAAAAAaaaaaaCcEEEEeeeeIIIIiiiiOOOOOOooooooNnUUUUuuuuYyy/; my %trans = ( 'Æ' => 'AE', 'æ' => 'ae', 'Þ' => 'TH', 'þ' => 'th', 'Ð' => 'TH', 'ð' => 'th', 'ß' => 'ss' ); $phrase =~ s/([ÆæÞþÐðß])/$trans{$1}/g; return $phrase; }, deaccent2 => sub{ my %acc = qw( À A Á A  A à A Ä A Å A Æ AE Ç C È E É E Ê E Ë E Ì I Í I Î I Ï I Ð TH Ñ N Ò O Ó O Ô O Õ O Ö O Ø O Ù U Ú U Û U Ü U Ý U Þ TH ß ss à a á a â a ã a ä a å a æ ae ç c è e é e ê e ë e ì i í i î i ï i ð th ñ n ò o ó o ô o õ o ö o ø o ù u ú u û u ü u ý y þ th ÿ y ); my $text = $string; $text =~ s/(.)/$acc{$1}?$acc{$1}:$1/eg; return $text; }, });
Returns on my system:
Rate deaccent2 deaccent
deaccent2 4316/s -- -86%
deaccent 30859/s 615% --
With data that has fewer accented characters, the disparity should grow much greater since it will short circuit if there are no characters to be transliterated.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: The Björk Situation
by rhesa (Vicar) on Feb 15, 2006 at 19:12 UTC | |
by thundergnat (Deacon) on Feb 16, 2006 at 00:09 UTC | |
by rhesa (Vicar) on Feb 16, 2006 at 00:30 UTC | |
by thundergnat (Deacon) on Feb 15, 2006 at 19:32 UTC | |
by rhesa (Vicar) on Feb 15, 2006 at 19:52 UTC | |
by japhy (Canon) on Feb 15, 2006 at 22:03 UTC | |
by rhesa (Vicar) on Feb 15, 2006 at 22:40 UTC | |
by helgi (Hermit) on Feb 22, 2006 at 11:28 UTC | |
by DrHyde (Prior) on Feb 23, 2006 at 09:51 UTC |