comment on

G'day from across the ditch, Ken. You're talkin' my language, mate.

Thanks very much for your time and all your tips.

The reason I wrote $f1 to the file and read it back into $f2 was just to make sure the variables weren't changing in the process, and from what I can tell they aren't. I'm struggling to understand how this issue is about writing/reading the file. My reasons are:

1. If I remove my "quick hack" and change the webpage's "Output: $f1" line to "Output: $f2" (which it was meant to be originally - sorry), the e-acute appears on the webpage correctly.

2. If I print $f1 (which has not been read from a file) to the PDF (e.g. $text->text("PDF Output:$f1=$f2");) no acutes appear correctly.

3. If I write $f1 to a file as you have suggested, and read it back into $f3, it then contains more bytes than $f1, and printing $f3 to the PDF still doesn't print e-acute properly.

Below is some modified code which demonstrates this (sorry, I haven't brought it into the general coding standards you've suggested at this stage).

#!/usr/bin/perl

use lib "/home/tospeirs/perl5/lib/perl5";
use CGI;
use PDF::API2;
use bytes;

use constant mm => 25.4 / 72;

$cgi = new CGI;
$f1 = $cgi->param(f1);

if (defined($f1))
{
        open (FILE, ">utf8_test1.out") or die "Can't open outfile";
        print FILE $f1;
        close FILE;

        open (FILE, "<utf8_test1.out") or die "Can't open infile";
        $f2 = <FILE>;
        close FILE;

        open my $fh, '>:encoding(UTF-8)', 'utf8_test2.out';
        print $fh $f1;
        close $fh;

        open my $fh, '<:encoding(UTF-8)', 'utf8_test2.out';
        $f3 = <$fh>;
        close $fh;

        $lengths = "Lengths: f1=" . bytes::length($f1) . ", f2=" . byt
+es::length($f2) . ", f3=" . bytes::length($f3);
        $cmp = ($f1 eq $f2) ? 'f1=f2' : 'f1<>f2';
        $cmp .= ($f1 eq $f3) ? ', f1=f3' : ', f1<>f3';

        $pdf = PDF::API2->new();

        $font1 = $pdf->corefont('Arial');

        $page = $pdf->page;             # Add blank page
        $page->mediabox(210/mm, 297/mm);

        $text = $page->text();

        $text->font($font1, 28);
        $text->translate(5/mm ,280/mm);

        # A quick hack to handle a couple of special chars
        #$f2 =~ s/\303\251/\351/g;  # e-acute
        #$f2 =~ s/\303\272/\372/g;  # u-acute

        $text->text("PDF Output:$f1=$f2=$f3");

        $pdf->saveas('utf8_test1.pdf');
}

print <<EOF;
Content-Type: text/html; charset=utf-8\n
<!DOCTYPE html>
<html lang='en-NZ'>
<head>
        <title>Test UTF-8</title>
        <meta charset='UTF-8'>
</head>
<body>
<form method='post'>
        Input: <input type='text' name='f1' value='$f1'>
        <br>
        <input type='submit' name='submit' value='Submit'>
        <br>
        Output f2: $f2
        <br>
        Output f3: $f3
        <br>
        $lengths
        <br>
        $cmp
</form>
</body>
</html>
EOF
[download]

This is what I see on the webpage after I submit "Cliché.":

Input: Cliché.
Submit
Output f2: Cliché.
Output f3: ClichÃ©.
Lengths: f1=8, f2=8, f3=10
f1=f2, f1<>f3

And the PDF ends up containing this:

PDF Output:ClichÃ©.=ClichÃ©.=ClichÃƒÂ©.

As you can see, none of those 3 came out right in the PDF, and the $f3 looks extra long, as if it's been double-encoded or something. Check this octal dump out:

$ od -c utf8_test1.out
0000000   C   l   i   c   h 303 251   .
$ od -c utf8_test2.out
0000000   C   l   i   c   h 303 203 302 251   .

Any ideas?

Thanks.
tel2

In reply to Re^2: Write special chars to PDF. UTF8? by tel2
in thread Write special chars to PDF. UTF8? by tel2

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.