in reply to how to calculate the length for other language content in PERL?

G'day vasanthgk91,

When posting Unicode in <code> tags, you'll get HTML character entity references. This is one instance where you should markup your code in <pre> tags.

When your source contains Unicode characters, you should use the utf8 pragma.

Example not using utf8:

$ perl -Mstrict -Mwarnings -E '
    my $text = q{சிவகாசி அருகே வெடி விபத்து : 2 பேர் உயிரிழப்பு};
    say length $text;                                            
'
120

Example using utf8:

$ perl -Mstrict -Mwarnings -E '
    use utf8;                                                    
    my $text = q{சிவகாசி அருகே வெடி விபத்து : 2 பேர் உயிரிழப்பு};
    say length $text;
'
46

Having no idea what constitutes a character in the Tamil language, I'll leave you tell me if 46 is the right answer.

Update: Actually, that does add up. 7(spaces) + 1(:) + 1(2) + 37 * 7(length '&#nnnn;') = 268, cf. "output I get wrong==268"

$ perl -E 'say(7+1+1+37*7)' 268

-- Ken

Replies are listed 'Best First'.
Re^2: how to calculate the length for other language content in PERL?
by vasanthgk91 (Sexton) on May 18, 2013 at 05:34 UTC
    thank u