Re^2: RFC: How to unaccent text?

Looking at Text::StripAccents source code, it seems quite inefficient: it splits the string in chars, loops over them replacing accented ones by their ASCII equivalent and then joins the string again.

Ouch. That sounds to me like it could be improved, and probably without changing the API. So, it could be better in a next version... (if somebody lends the author a hand. It could be you.)

It's so simple that it makes me think if a module is actually required...

What about the datatable... Are you going to construct it by hand, every time? Or will you be using copy-and-paste?

Make it a module, it's the perfect place for it.

p.s. I suppose tr/// would be a lot more efficient than s///, at least for single character replacements. You might benchmark it, to compare.

Comment on Re^2: RFC: How to unaccent text? Select or Download Code

Replies are listed 'Best First'.
Re^3: RFC: How to unaccent text? by salva (Canon) on Apr 11, 2007 at 10:55 UTC
What about the datatable... Are you going to construct it by hand, every time? Or will you be using copy-and-paste? Well, as I pointed in my previous reply, the transformation is not unique, there could be several variations, and including the table in the code is an easy way to ensure that the right one is used. For instance, Text::StripAccents converts 'ß' to 'ss', something unexpected for an spanish user like me. IMO, the right solution would be to create a set of language dependent modules similar to Lingua::DE::ASCII.	[reply]

Replies are listed 'Best First'.

Re^3: RFC: How to unaccent text?
by salva (Canon) on Apr 11, 2007 at 10:55 UTC

What about the datatable... Are you going to construct it by hand, every time? Or will you be using copy-and-paste?

Well, as I pointed in my previous reply, the transformation is not unique, there could be several variations, and including the table in the code is an easy way to ensure that the right one is used.

For instance, Text::StripAccents converts 'ß' to 'ss', something unexpected for an spanish user like me.

IMO, the right solution would be to create a set of language dependent modules similar to Lingua::DE::ASCII.

[reply]