Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

This is a small utility module I wrote for my work. I've tried to make it reasonably DWIMmish and equip it with enough convenience features that it gets out of your face as quickly as possible.

#!/usr/bin/perl -w =pod head1 NAME PostScript::Glyph::MapToUnicode - PostScript glyph name to Unicode con +version =head1 SYNOPSIS use PostScript::Glyph::MapToUnicode file => '/usr/doc/PostScript/aglf +n13.txt'; print PostScript::Glyph::MapToUnicode::map('Euro'), "\n"; =head1 DESCRIPTION This module implements (most of - see L</"BUGS">) the PostScript glyph + name to Unicode code point conversion algorithm described by Adobe at L<http://partners.adobe.com/asn/tech/type/unicodegn.jsp>. To do something more than marginally useful with this module you shoul +d download the B<Adobe Glyph List> from L<http://partners.adobe.com/asn/tech/type/aglfn13.txt>. =head1 INTERFACE =over 4 =item parse_adobeglyphlist() This function parses an B<Adobe Glyph List> file and returns true on s +uccess. On failure, it returns false and supplies an error message in the pack +age variable C<$ERROR>. It expects its first argument to specify how to re +trieve the data. The following options exist: =over 4 =item file Takes the name of a file containing the B<Adobe Glyph List>. =item fh Takes a filehandle reference that should be open on a file containing +the Adobe Glyph List. =item array Takes an array reference. Each array element is expected to contain on +e line from the B<Adobe Glyph List>. =item data Takes a scalar that is expected to contain the entire B<Adobe Glyph Li +st> file. =back For convenience, you can pass the same parameter to the module's C<imp +ort()> function, as exemplified in L</"SYNOPSIS">. It will croak if it encoun +ters any errors. =item map() Takes a list of strings containing whitespace separated PostScript gly +phs and returns them concatenated as a single string in Unicode encoding. You +may want to memoize this function when processing large PostScript documents. =back =head1 BUGS C<map()> does not take the font into account, so it will produce incor +rect results for glyphs from the B<ZapfDingbats> font. =head1 AUTHOR Aristotle Pagaltzis L<mailto:pagaltzis@gmx.de> =head1 COPYRIGHT This program is Copyright (c)2003 Aristotle Pagaltzis. All rights res +erved. This program is free software; you can redistribute it and/or modify i +t under the terms of either: a) the GNU General Public License as published by + the Free Software Foundation; either version 1, or (at your option) any later v +ersion, or b) the "Artistic License" which comes with Perl. =head1 DISCLAIMER This program is distributed in the hope that it will be useful, but WI +THOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITN +ESS FOR A PARTICULAR PURPOSE. See either the GNU General Public License or the A +rtistic License for more details. =cut package PostScript::Glyph::MapToUnicode; use strict; use vars qw($ERROR); my $uni_notation = qr{ \A uni ( (?: [0-9ABCEF] [\dA-F] {3} | D [0-7] [\dA-F] {2} )+ ) \z }x; my $u_notation = qr{ \A u ( [0-9ABCEF] [\dA-F] {3,5} | (?: D [0-7] [\dA-F] {2,3} | D [8-9A-F] [\dA-F] {3} ) ) \z }x; my %agl; sub map { my $digits; return join '', map { exists $agl{$_} ? $agl{$_} : (($digits) = m/$uni_notation/) ? map { pack "U", hex } $digits =~ /(....)/g : (($digits) = m/$u_notation/) ? pack "U", hex $digits : do { '' }; } map { split /_/ } map { /\A(.+?)\./ ? $1 : $_ } map { split } @_; } sub parse_adobeglyphlist { my $method = shift; my $data = $method eq 'array' ? do { my $array = shift; unless(ref $array eq 'ARRAY') { $ERROR = "Expected array reference in '$array'"; return; } $array; } : $method eq 'data' ? [ split /^/m, shift ] : ($method eq 'file' or $method eq 'fh') ? do { my $fh = $method eq 'fh' ? shift : do { open my $fh, '<', shift or ($ERROR = "$!", return) +; $fh; }; [ <$fh> ]; } : ($ERROR = "Unknown parsing interface '$method'", return); %agl = do { @$data = grep !/\A (?: \# | \s* \z)/x, @$data; chomp @$data; map { my ($code_pt, $glyph) = split /;/; ($glyph => pack "U", hex $code_pt); } @$data; }; delete $agl{'.notdef'}; return 1; } sub import { shift; unless(&parse_adobeglyphlist) { require Carp; Carp::croak("Failed to parse AdobeGlyphList: $ERROR"); } } 1;

Not using any exports was a conscious choice. One big factor was that I absolutely despise the Exporter interface. Considering how little there is to possibly export in the first place, I don't want to introduce a dependency on a non-core exporter either. And lastly, writing my own import() gives me a nice opportunity to be convenient - however, it would be difficult to bend its semantics far enough sideways to allow the user to specify when s/he doesn't actually want to import anything (not that I could be bothered writing a sufficiently flexible exporter anyway).

Other than that, I don't really have strong opinions on any of my choices.

What do people think of the name? The POD? The code?

Originally, I was going to distribute the glyph list as an appendix in the module's __DATA__, but its license is unclear, so I opted to tell users where to get it from and put the module under the same terms as Perl instead.

Also, I'm unsure whether the specification on the Adobe site should be interpreted such that it allows glyph names like u00D7FF, in which case my $u_notation require far more contortions.

Any comments would be gladly welcome.

Update: pack "U", $digits now correctly says pack "U", hex $digits. I need to write a test suite..

Makeshifts last the longest.


In reply to (RFC) PostScript::Glyph::MapToUnicode - my first (intended-to-be) CPAN module by Aristotle

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-03-28 13:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found