Thanks very much for your time and all your tips.
The reason I wrote $f1 to the file and read it back into $f2 was just to make sure the variables weren't changing in the process, and from what I can tell they aren't. I'm struggling to understand how this issue is about writing/reading the file. My reasons are:
1. If I remove my "quick hack" and change the webpage's "Output: $f1" line to "Output: $f2" (which it was meant to be originally - sorry), the e-acute appears on the webpage correctly.
2. If I print $f1 (which has not been read from a file) to the PDF (e.g. $text->text("PDF Output:$f1=$f2");) no acutes appear correctly.
3. If I write $f1 to a file as you have suggested, and read it back into $f3, it then contains more bytes than $f1, and printing $f3 to the PDF still doesn't print e-acute properly.
Below is some modified code which demonstrates this (sorry, I haven't brought it into the general coding standards you've suggested at this stage).
This is what I see on the webpage after I submit "Cliché.":#!/usr/bin/perl use lib "/home/tospeirs/perl5/lib/perl5"; use CGI; use PDF::API2; use bytes; use constant mm => 25.4 / 72; $cgi = new CGI; $f1 = $cgi->param(f1); if (defined($f1)) { open (FILE, ">utf8_test1.out") or die "Can't open outfile"; print FILE $f1; close FILE; open (FILE, "<utf8_test1.out") or die "Can't open infile"; $f2 = <FILE>; close FILE; open my $fh, '>:encoding(UTF-8)', 'utf8_test2.out'; print $fh $f1; close $fh; open my $fh, '<:encoding(UTF-8)', 'utf8_test2.out'; $f3 = <$fh>; close $fh; $lengths = "Lengths: f1=" . bytes::length($f1) . ", f2=" . byt +es::length($f2) . ", f3=" . bytes::length($f3); $cmp = ($f1 eq $f2) ? 'f1=f2' : 'f1<>f2'; $cmp .= ($f1 eq $f3) ? ', f1=f3' : ', f1<>f3'; $pdf = PDF::API2->new(); $font1 = $pdf->corefont('Arial'); $page = $pdf->page; # Add blank page $page->mediabox(210/mm, 297/mm); $text = $page->text(); $text->font($font1, 28); $text->translate(5/mm ,280/mm); # A quick hack to handle a couple of special chars #$f2 =~ s/\303\251/\351/g; # e-acute #$f2 =~ s/\303\272/\372/g; # u-acute $text->text("PDF Output:$f1=$f2=$f3"); $pdf->saveas('utf8_test1.pdf'); } print <<EOF; Content-Type: text/html; charset=utf-8\n <!DOCTYPE html> <html lang='en-NZ'> <head> <title>Test UTF-8</title> <meta charset='UTF-8'> </head> <body> <form method='post'> Input: <input type='text' name='f1' value='$f1'> <br> <input type='submit' name='submit' value='Submit'> <br> Output f2: $f2 <br> Output f3: $f3 <br> $lengths <br> $cmp </form> </body> </html> EOF
Input: Cliché. Submit Output f2: Cliché. Output f3: Cliché. Lengths: f1=8, f2=8, f3=10 f1=f2, f1<>f3And the PDF ends up containing this:
PDF Output:Cliché.=Cliché.=Cliché.As you can see, none of those 3 came out right in the PDF, and the $f3 looks extra long, as if it's been double-encoded or something. Check this octal dump out:
$ od -c utf8_test1.out 0000000 C l i c h 303 251 . $ od -c utf8_test2.out 0000000 C l i c h 303 203 302 251 .Any ideas?
Thanks.
tel2
In reply to Re^2: Write special chars to PDF. UTF8?
by tel2
in thread Write special chars to PDF. UTF8?
by tel2
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |