in reply to How to count string length with latin characters?

The form is probably submitted using UTF-8 character encoding, which uses 2 bytes for latin accented characters.

You have basically 2 ways of dealing with this: only allow submissions in latin-1 (which is the default perl character encoding) using the "accept-charset" property on the <form> tag, or make sure perl knows about your encoding:

use Encode qw(decode); my $real_string = decode("utf8",$input_string); #assumes $input string + is in UTF-8 encoding print "string is ".lenght($real_string)." characters\n"; # length() no +w interprets $real_string in character instead of bytes.
There is a LOT of subtle stuff going on here. You should probably read Encode first. Maybe.

Replies are listed 'Best First'.
Re^2: How to count string length with latin characters?
by rhesa (Vicar) on Nov 02, 2006 at 01:07 UTC
    (...) only allow submissions in latin-1 (which is the default perl character encoding) using the "accept-charset" property on the <form> tag (...)
    That's a reasonable solution, so long as you keep in mind that characters beyond latin-1 will be submitted as encoded entities (e.g. &#321;). Taking that into account in length calculations is a *lot* harder than consistently using utf8 across the board.