in reply to Re: Write special chars to PDF. UTF8?
in thread Write special chars to PDF. UTF8?

G'day from across the ditch, Ken.  You're talkin' my language, mate.

Thanks very much for your time and all your tips.

The reason I wrote $f1 to the file and read it back into $f2 was just to make sure the variables weren't changing in the process, and from what I can tell they aren't. I'm struggling to understand how this issue is about writing/reading the file. My reasons are:

1. If I remove my "quick hack" and change the webpage's "Output: $f1" line to "Output: $f2" (which it was meant to be originally - sorry), the e-acute appears on the webpage correctly.

2. If I print $f1 (which has not been read from a file) to the PDF (e.g. $text->text("PDF Output:$f1=$f2");) no acutes appear correctly.

3. If I write $f1 to a file as you have suggested, and read it back into $f3, it then contains more bytes than $f1, and printing $f3 to the PDF still doesn't print e-acute properly.

Below is some modified code which demonstrates this (sorry, I haven't brought it into the general coding standards you've suggested at this stage).

#!/usr/bin/perl use lib "/home/tospeirs/perl5/lib/perl5"; use CGI; use PDF::API2; use bytes; use constant mm => 25.4 / 72; $cgi = new CGI; $f1 = $cgi->param(f1); if (defined($f1)) { open (FILE, ">utf8_test1.out") or die "Can't open outfile"; print FILE $f1; close FILE; open (FILE, "<utf8_test1.out") or die "Can't open infile"; $f2 = <FILE>; close FILE; open my $fh, '>:encoding(UTF-8)', 'utf8_test2.out'; print $fh $f1; close $fh; open my $fh, '<:encoding(UTF-8)', 'utf8_test2.out'; $f3 = <$fh>; close $fh; $lengths = "Lengths: f1=" . bytes::length($f1) . ", f2=" . byt +es::length($f2) . ", f3=" . bytes::length($f3); $cmp = ($f1 eq $f2) ? 'f1=f2' : 'f1<>f2'; $cmp .= ($f1 eq $f3) ? ', f1=f3' : ', f1<>f3'; $pdf = PDF::API2->new(); $font1 = $pdf->corefont('Arial'); $page = $pdf->page; # Add blank page $page->mediabox(210/mm, 297/mm); $text = $page->text(); $text->font($font1, 28); $text->translate(5/mm ,280/mm); # A quick hack to handle a couple of special chars #$f2 =~ s/\303\251/\351/g; # e-acute #$f2 =~ s/\303\272/\372/g; # u-acute $text->text("PDF Output:$f1=$f2=$f3"); $pdf->saveas('utf8_test1.pdf'); } print <<EOF; Content-Type: text/html; charset=utf-8\n <!DOCTYPE html> <html lang='en-NZ'> <head> <title>Test UTF-8</title> <meta charset='UTF-8'> </head> <body> <form method='post'> Input: <input type='text' name='f1' value='$f1'> <br> <input type='submit' name='submit' value='Submit'> <br> Output f2: $f2 <br> Output f3: $f3 <br> $lengths <br> $cmp </form> </body> </html> EOF
This is what I see on the webpage after I submit "Cliché.":
Input: Cliché.
Submit
Output f2: Cliché.
Output f3: Cliché.
Lengths: f1=8, f2=8, f3=10
f1=f2, f1<>f3
And the PDF ends up containing this:
PDF Output:Cliché.=Cliché.=Cliché.
As you can see, none of those 3 came out right in the PDF, and the $f3 looks extra long, as if it's been double-encoded or something.  Check this octal dump out:
$ od -c utf8_test1.out
0000000   C   l   i   c   h 303 251   .
$ od -c utf8_test2.out
0000000   C   l   i   c   h 303 203 302 251   .
Any ideas?

Thanks.
tel2

Replies are listed 'Best First'.
Re^3: Write special chars to PDF. UTF8?
by poj (Abbot) on Feb 13, 2016 at 09:44 UTC

    Try using decode() for the pdf

    #!/perl use strict; use warnings; use CGI; use CGI::Carp 'fatalsToBrowser'; use PDF::API2; use Encode; my $cgi = new CGI; my $f1 = $cgi->param('f1'); my $f2 = decode('UTF-8', $f1 ); open OUT,'>','c:/temp/web/pdf.txt' or die; # change path to suit print OUT "$f1 $f2"; close OUT; my $pdf = PDF::API2->new()->mediabox('A4'); my $text = $pdf->page->text; my $font1 = $pdf->corefont('Arial'); $text->font($font1, 36); $text->translate(100,500); $text->text("f1 = $f1"); $text->translate(100,600); $text->text("f2 = $f2"); $pdf->saveas('c:/temp/web/utf8_test1.pdf'); # change path to suit print <<EOF; Content-Type: text/html; charset=UTF-8\n <!DOCTYPE html> <html lang='en-NZ'> <head> <title>Test UTF-8</title> <meta charset="UTF-8"> </head><body> $f1 $f2 <form method="post"> Input: <input type="text" name="f1" value="$f1"><br> <input type="submit" name="submit" value="Submit"> </form></body></html> EOF
    poj
      Thank you very much for that code, poj!

      That's working for me.

      tel2