Lejocode has asked for the wisdom of the Perl Monks concerning the following question:

Hello again Monks, i have a problem with TTFMetrics giving me a wrong values for unicode strings.

here's an example:
/the string doesn't show properly in the code section, so here it is "بلحة"/
use warnings; use strict; use Font::TTFMetrics; use utf8; my $metrics = Font::TTFMetrics->new("Arial.ttf"); #16px my $str_en = ($metrics->string_width("Balha") * 1152) / (147456); my $str_ar = ($metrics->string_width("بلحة") +* 1152) / (147456); my $str_ar2 = ($metrics->string_width("\x{0628}\x{0644}\x{062D}\x{062 +9}") * 1152) / (147456); print ('My English string: ' . "$str_en" . "px\n"); print ('My Unicode string: ' . "$str_ar" . "px\n"); print ('My Unicode string 2: ' . "$str_ar2" . "px\n");
the output is:
My English string: 40.921875px
My Unicode string: 33.0390625px
My Unicode string 2: 33.0390625px


the width value of the english string is correct. and for the unicode string is wrong and should be 21 px.
is there something i should do to make it right?

Replies are listed 'Best First'.
Re: How to get an accurate TTFMetrics values for unicode strings?
by vr (Curate) on Mar 09, 2017 at 18:35 UTC

    It's not "Unicode" issue. Arabic has different glyphs for same letter depending on context. A quick and dirty solution:

    use strict; use warnings; use Font::TTFMetrics; my $pointsize = 16; my $resolution = 72; my $metrics = Font::TTFMetrics-> new( "Arial.ttf" ); my $c = $pointsize * $resolution / 72 / $metrics-> get_units_per_em; my $str = "\x{0628}\x{0644}\x{062D}\x{0629}"; printf "Wrong answer: %.0f\n", $metrics-> string_width( $str ) * $c; printf "Right answer: %.0f\n", $metrics-> string_width( render_arabic( + $str )) * $c; sub render_arabic { my $word = shift; return $word if length $word == 1; # isolated my %LUT = ( "\x{0628}" => [ "\x{FE90}", "\x{FE92}", "\x{FE91}" ], "\x{0644}" => [ "\x{FEDE}", "\x{FEE0}", "\x{FEDF}" ], "\x{062D}" => [ "\x{FEA2}", "\x{FEA4}", "\x{FEA3}" ], "\x{0629}" => [ "\x{FE94}" ], ); $word =~ s/^. /$LUT{ $& }[ 2 ]/x; $word =~ s/(?<=.).(?=.)/$LUT{ $& }[ 1 ]/xg; $word =~ s/ .$/$LUT{ $& }[ 0 ]/x; return $word }

    Wrong answer: 33 Right answer: 22
      That's dirty indeed. actually, i was hoping for some simple solution as i'm very new to this and still many things that i don't know. and some Anonymous Monk mentioned that TTFMetrics is 14 years old, so can be there any new perl module that can do this with a single line of code?

        Any reason to expect simple solution for a complex problem? With a "single line of code"? It's not just sum of metrics resulting from arbitrary sequence of Unicode code points, that you are looking for. But layout engine for complex writing systems, with ligatures, contextual substitutions, etc. A magic which happens when you type 4 characters and they "automatically" change their shape (and width, if that's what you are after), after any new character is added.

        Think uniscribe, or pango. Apropos of Pango:

        use strict; use warnings; use feature 'say'; use Pango; my $surface = Cairo::ImageSurface-> create( 'argb32', 200, 100 ); my $cr = Cairo::Context-> create( $surface ); my $layout = Pango::Cairo::create_layout( $cr ); my $font = Pango::FontDescription-> from_string( 'Arial 16' ); #Pango::Cairo::Context-> set_resolution( $cr, 72 ); $layout-> set_font_description( $font ); $layout-> set_text( "\x{0628}\x{0644}\x{062D}\x{0629}" ); say for Pango::Layout::get_pixel_size( $layout );

        Not a single line, is it? But "new", definitely. It says width for a 96 dpi default resolution, and the commented line is there because I didn't figure out how to change it.

        If a "single line of code" is a priority, you can qx( pango-view ... and parse the pgm-file for width, all that fits nicely into a single line.

Re: How to get an accurate TTFMetrics values for unicode strings?
by Anonymous Monk on Mar 09, 2017 at 17:07 UTC
    Are you sure that font contains Arabic characters? Also, TTFMetrics does claim to support unicode, but it's a version 0.1 module from 14 years ago...
      Yes, arial have it. plus i tried other fonts too, same result.