Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution
by hippo (Archbishop) on Sep 27, 2024 at 22:37 UTC
|
Thanks for posting. You've clearly put some thought into this, which is a really good sign. FWIW, here is my take on your proposal(s).
Even without other considerations, I'd really like to move to "PDF" as root in namespace.
No reason not to, given that the root already exists. If it were me I would not be doing the whole PDF::CAMPDF::* thing but would probably just settle for the shorter PDF::CAM::*. A second "PDF" in there just seems redundant - although others might disagree.
No, I didn't try to ask Chris.
I think it's only polite to ask. Especially if you make it clear that even not replying is perfectly fine. If an objection comes back then you can abide by that but if the reply is "Actually, under CAM::PDF is the perfect home for it" then you're done and dusted.
On the other hand, "Extended" is kind of dull and about nothing.
This is the part where perhaps a positive change might be made. "Extended" hints at some extra functionality but not what that is or how it relates to the parent module. Have you just introduced more subs or have you overwritten existing ones? Is there something which ties all this new code together? Can you sum up in one sentence why any given programmer would use the new module over the existing one (or vice versa)? Perhaps then we can all have a think and come up with something more descriptive (or as you say, at least fun).
| [reply] [d/l] [select] |
|
Thanks for advice, you are right about being polite, I'll write to Chris shortly.
Can you sum up in one sentence...
No new functions yet; but smaller final pdf file size because of modern compression; some bugs fixed (no doubt, some added); faster on opening and parsing (and writing), sometimes significantly; somewhat less of a RAM hog.
Long version:
The CAM/PDF.pm contains 133 subroutines; 77 entries are enumerated in API section of the POD (though some of them use wildcards to designate several), and there are 104 methods further described in detail, some marked strictly internal. Plus there are several supporting modules in distribution with quite a few of their own subs. My module, close to 100% complete, has 37 subroutines. Most of them are replacement of originals, either complete re-write or just touched, and some are internal helper subs. No new functions for the end user, actually, at this stage. Added behaviour is triggered using options passed to constructor. Or better let's start with this: I'm planning it to be backward compatible. Whenever someone does:
use CAM::PDF;
my $pdf = CAM::PDF->new( $fn_or_stdin_or_pdf, @other_args );
# ... things happen with $pdf
they substitute for my module name and their code works as previously. New features are opt-in, with documented exception. On the external, one such feature is ability to save to 1.5+ version with compressed cross-reference stream and object streams -- thus, smaller file size. No CPAN module can do that, to my knowledge. (Rather, PDF::API2 can append xref stream on incremental update; but it can only update, not save "cleanly" and optimize; file size can only grow. Plus, it's funny to see a screenful or two of your code in "alien" environment, because the patch was written by yours truly.) Slightly less external option is "no slurping on open", but it only matters for really huge input with high resolution images. Typical office or business pdf file size is dwarfed by what CAM::PDF consumes itself. Speaking of which, I changed some of its internal accounting and procedures to improve performance.
| [reply] [d/l] [select] |
|
Since it's mostly internal and no updated interface and you plan to make it backwards compatible (i.e. any changed behavior is opt-in and no change in prereqs) why not ask permission to take over as the maintainer of the original distribution?
| [reply] |
|
|
one such feature is ability to save to 1.5+ version with compressed cross-reference stream and object streams -- thus, smaller file size
Hopefully this will be a user configurable option. There are still many systems that only work with the older versions.
| [reply] |
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution
by stevieb (Canon) on Nov 05, 2024 at 07:54 UTC
|
I am very impressed with the significant amount of due diligence you've done here trying to sort this out. It reminds me of me in my early days.
You are exceptionally articulate, detailed (yet quite concise), big-picture/long-term thinking and respective of prior art. This is very impressive.
My gut says that with all the thought that went into just figuring out if you should extend a current namespace, you should come up with your own (under "PDF"), that doesn't include any acronyms that aren't particular to your distribution. I also think that you should request MAINT access to the existing distribution; not necessarily for updating it, but perhaps to update the README so that it claims the old software as either A: deprecated (because your software is backward compatible) or B: antiquated (because your software does *most* of what the older one does, but maybe not everything), either/or with a link to your distribution.
I stood on my head for my first couple of CPAN releases, waiting for feedback from others on how I should proceed. I was scared shitless. I was afraid of infringing on other's distribution names, their work, and perhaps mostly, breaching the conduct of creating a new CPAN name for my work.
Don't overthink it. You've worked overtime to ensure you're within the guidelines, and you've gotten wonderful feedback from many people.
Just do it. Get your distribution up there in what makes you feel best, and be very proud to be an official open source Perl CPAN contributor!
-stevieb
| [reply] |
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution
by Anonymous Monk on Oct 28, 2024 at 09:51 UTC
|
Dear Monks, don't know if I should continue here or ask a separate SOPW question, especially if it's not strictly about Perl. Should I completely drop support for (very obsolete and perhaps buggy) implementation of PDF Encryption in CAM::PDF, or continue to drag it along into my sub-class? Will anyone miss it, or should I restrict i.e. isolate this "encryption" from my new features, emitting warnings now and then "don't use it here, don't do that, etc."?
CAM::PDF claims to support 40-bit RC4 encryption to read-write, and 128-bit non-AES to read only. Nothing more modern. In my tests, it does open encrypted files of both kinds but only if owner and user passwords were the same; and for the 40-bit variant, it can also open a file with different O/U pair of passwords, but succeeds only if U/U pair of strings is provided to constructor. Weird.
When CAM::PDF is used to encrypt non-encrypted PDF files, for P/P (i.e. same O/U) passwords, the produced file is opened OK by Adobe Reader (or modern browsers) after requesting a password. For O/U (different pair), Reader accepts either "U" or "O" password but displays a blank page. Actually, the Reader always accepts either of the two, if another authoring app was used to encrypt (and displays a file correctly), so the comment in line 2951 is moot. "Perhaps a bug", indeed.
I can successfully run CAM::PDF's test suite, with "CAM::PDF" replaced with "PDF::CAMPDF::X" in test files. Some of the tests are about setting PDF permissions/passwords, then saving and re-reading scalars. But I'm unhappy because of compromises I had to make to keep functional what I consider obsolete and buggy, and not interested in. Also, my idea was to inherit the test suite (at least, for now); if "encryption/permissions" part is stricken out, it gets somewhat slim.
Personally, PDF encryption is source of confusion/irritation at best and misery at worst; and absolute taboo in my working life (pre-press). But I don't know, perhaps it's better not to be too hasty (throwing "encryption" away)? Is PDF encryption, modern or not, actually widely used in, eh-m, the "first world"? Considering the effort obviously put into development by CAM? Or is it a niche thing no one will miss?
| [reply] [d/l] [select] |
|
Is PDF encryption, modern or not, actually widely used
IME, no. I suspect that this is largely because it suffers from the requirement to share the password with those others you wish to be able to read the document. And if you are going to share the password securely, why not just share the document securely? When I do have a need to share a PDF securely with someone I just encrypt it with their PGP public key.
| [reply] |
|
| [reply] |
|
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution
by Anonymous Monk on Nov 13, 2024 at 10:12 UTC
|
Thanks, everyone, for advice and encouragement. Eh-m, back to ground zero (been asked to drop the acronym), is PDF::Manipulate (rather, PDF::Manipulate::X) an OK name? (Contra: Verb as 2nd term? Too long? Negative connotations?) I considered "Parser", or "Wielder", or "ReadWrite", but like them even less. "Manipulate" is in the very 1st sentence on CAM::PDF pod page. In essence, I consider (and use) CAM::PDF as low level parser and manipulator with few high level methods, a tool for other developers to make (high level) tools.
Why "X"? PDF::CAMPDF::X would have eXtended CAM::PDF to serve as drop-in replacement in any existing code. PDF::CAMPDF::X2 would, in turn, subclass it but get rid of CAM::PDF::Node (blessed-for-no-reason constant-keys-set hashrefs) and use arrayrefs for "nodes". An exploration at first, how much of a performance boost that would add. Would be advised for new code, can't serve as drop-in CAM::PDF replacement. Because see e.g. line#82, and many more, in CAM::PDF::Annot accessing "{value}"; poking inside "nodes" is all over any code using CAM:PDF.
So then the idea now is to have PDF::Manipulate::X and PDF::Manipulate::X2 in the same distribution and kind of on "same depth level". Unless, that is, strongly advised not to. I'll think about how to properly and carefully document everything.
| [reply] [d/l] [select] |
|
is PDF::Manipulate (rather, PDF::Manipulate::X) an OK name? (Contra: Verb as 2nd term? Too long? Negative connotations?)
"Manipulate" seems OK to me. If you think it is overly long then "Manip" is also fine (See Date::Manip, Bit::Manip, et al).
I considered "Parser", or "Wielder", or "ReadWrite", but like them even less.
Simply, "Edit"?
Honestly, it's the ::X and ::X2 which are slightly more confusing anyway. I would probably be more inclined to go with PDF::Foo::Compat for the drop-in replacement one and just PDF::Foo for the forward-looking one (where Foo is one of Manipulate/Manip/Edit/...). How does that sound?
| [reply] [d/l] [select] |
|
Thank you very much, hippo, I like PDF::Manip a lot. Somehow modules you mentioned slipped my mind, and metacpan's search for "Manipulate" didn't show them in the 1st hundred. Very respectful company for my distribution, then :-). Edit is not so apt, because "editing" may falsely imply the ability to e.g. "search for word 'Perl' and make its font bold-faced". I also see good reasons/logic for having Compat in module's name. Originally I intended (restoring for a moment the previous name-set) to have PDF::CAMPDF::X in Synopsis, etc., and only later/gradually advise to use X2 in new code. For usual/small PDF files, X2 is not much of an advantage, and less confusion to learn the difference. But now I see perhaps you are right, I owe fair amount of explaining to would-be users anyway.
| [reply] [d/l] [select] |
|
Consider ::Manip over ::Manipulate, since it's more common among other CPAN modules.
| [reply] |
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution
by Anonymous Monk on Oct 30, 2024 at 10:40 UTC
|
A question about Perl this time. Is following an OK solution (and backward compatible to e.g. 5.010) to write numeric values as they were parsed, be it either 3.14 or 3.14159, i.e. not to force rounding to arbitrarily "sane" length? Results of numeric calculations still have to be formatted. The emergency test for "e" is for values from whatever sources whose past allows Perl to treat them as strings. Scientific notation is syntax error. CAM::PDF does just "$val", I don't think it's good enough.
do {
use B;
if ( not B::SVf_POK & B::svref_2object( \$val )->FLAGS or
-1 != index $val, 'e' ) {
$val = sprintf '%.4f', $val;
$val =~ s/\.?0+$//
}
"$val"
}
| [reply] [d/l] [select] |
|
use strict;
use warnings;
use Test::More tests => 1;
my $val = 3.14159;
my $oldval = $val;
my $newval = do {
use B;
if ( not B::SVf_POK & B::svref_2object( \$val )->FLAGS or
-1 != index $val, 'e' ) {
$val = sprintf '%.4f', $val;
$val =~ s/\.?0+$//
}
"$val"
};
is "$oldval", "$newval";
I think you have to either "round to an arbitrarily sane length" as your code does or else just go with the flow.
| [reply] [d/l] |
|
Sorry, I wasn't clear. "Numeric" values are result of parsing, but are kept internally as strings (directly from capture buffers), and are treated as numbers only if user performs any calculation. No encapsulation is enforced by CAM::PDF, and some of these values can be replaced/assigned (or new ones created) by user, either as, hopefully, actual numbers, but also perhaps they can arrive as "numeric" strings. The idea was for original values to be "passed through" unmodified when writing new file or stringifying content. Ugly strings such as "3.141592653589793" if supplied by user will sneak into output, but I guess let it be, it's syntactically OK. Too much hassle otherwise, to really keep track of what has come from parsing source, and what was added by user.
| [reply] [d/l] |
|
Another question (hopefully they aren't regarded "do the research for me" or "write code for me"). W.r.t. output, is adding a dependency on File::Copy for trivial task a bad idea? Is "append mode" considered undesirable perhaps in file systems I have no experience with?
Originally, a file is always slurped in, content scalar either appended to or re-generated from scratch for incremental or clean save respectively; then output file is opened in create mode and scalar is printed.
In worst case (minute incremental update to a very large file using the same file name), almost exact huge copy is written anew. I don't like this design and re-wrote this part. Incrementally updating to the same file name opens a file in append mode, then as large or small as required data is written. Incrementally updating to a different file name uses File::Copy::copy() before that. Especially useful if original file wasn't slurped-in.
I could do without File::Copy, performing instead read/print loop through some small buffer (16 KB or what's OK value these days), but why shouldn't I use dedicated/optimised module? Only not to add a dependency? On core module?
As to "no appending" in original CAM::PDF, I can envision a case when file is modified by someone between my read and write. Then corrupt PDF would be saved. But it's no worse than my erasing "someone"'s work by opening a file in create mode. Both scenarios are somewhat improbable for PDFs on disk.
| [reply] [d/l] [select] |
|
is adding a dependency on File::Copy for trivial task a bad idea?
Note that it's included in core:
$ corelist File::Copy
Data for 2024-10-20
File::Copy was first released with perl 5.002
| [reply] [d/l] |
|
|
|
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution
by Anonymous Monk on Dec 17, 2024 at 00:05 UTC
|
Back in business after distraction. Tutorials and guides demand tests. Stealing parent's tests doesn't look good (gonna do it anyway), so I'll write something for added features. How to test whether "no slurping" mode is indeed functional? Is what follows an overkill? Are there blunders, can it be improved? Have I overseen something on CPAN?
Btw, PerlIO::via docs leave a few things to be desired. PUSHED is said to return "an object or the class". "Class" what? I can only satisfy it with blessed whatever, even as ridiculous as below. READ's arg is said to be "$buffer", but then I only succeeded with not very nice direct access to @_ element (won't Perl::Critic drop unconscious about it?). And using package as written as e.g. io layer and then reading with diamond operator (not related to my tests) generates random length strings, very weird. Perhaps it could be mentioned that without FILL defined (READ only) the readline is unusable? It's strange that a few core and CPAN modules I checked all work through FILL. No one does READ.
use strict;
use warnings;
use feature 'say';
use Test::More;
BEGIN {
package Counter;
use strict;
use warnings;
our $count;
sub PUSHED {
$count = 0; return bless \(1 + 1), 1
}
sub READ {
my ( undef, $buffer, $len, $fh ) = @_;
my $bytes = read $fh, $_[1], $len;
$count += $bytes;
return $bytes
}
sub BINMODE { 0 }
sub SEEK {
CORE::seek( $_[3], $_[1], $_[2] )
}
}
BEGIN {
*CORE::GLOBAL::open = sub {
splice @_, 1, 1, '<:raw:via(Counter)'
if defined $_[1] and $_[1] eq '<';
&CORE::open
}
};
my $fname = 't/sample1.pdf';
die unless -e $fname;
my $fsize = -s _;
my $doc;
use CAM::PDF;
$doc = CAM::PDF-> new( $fname );
ok $doc, "file reading OK with CAM::PDF";
cmp_ok $Counter::count, '==', $fsize,
"had to read $Counter::count bytes of total $fsize";
$doc-> getPageDimensions( 1 );
cmp_ok $Counter::count, '==', $fsize,
"for info 1, cumulative read is OK: $Counter::count";
$doc-> getPageDimensions( 2 );
cmp_ok $Counter::count, '==', $fsize,
"for info 2, cumulative read is OK: $Counter::count";
$doc-> cleansave;
cmp_ok $Counter::count, '==', $fsize,
"for all info, cumulative read is OK: $Counter::count";
# done_testing;
### you may wish to delete to the end of file for SSCCE,
### PDF::Manip is not released yet
use lib 'lib';
use PDF::Manip;
# $doc = PDF::Manip-> new( $fname );
# ok $doc, "file reading OK with PDF::Manip";
# cmp_ok $Counter::count, '==', $fsize,
# "had to read $Counter::count bytes of total $fsize";
# $doc-> getPageDimensions( 1 );
# cmp_ok $Counter::count, '==', $fsize,
# "for info 1, cumulative read is OK: $Counter::count";
# $doc-> getPageDimensions( 2 );
# cmp_ok $Counter::count, '==', $fsize,
# "for info 2, cumulative read is OK: $Counter::count";
# $doc-> cleansave;
# cmp_ok $Counter::count, '==', $fsize,
# "for all info, cumulative read is OK: $Counter::count";
$doc = PDF::Manip-> new( $fname, { slurp => 0 });
ok $doc, "file reading OK with PDF::Manip (no slurping)";
cmp_ok $Counter::count, '<', $fsize,
"had to read $Counter::count bytes of total $fsize";
$doc-> getPageDimensions( 1 );
cmp_ok $Counter::count, '<', $fsize,
"for info 1, cumulative read is OK: $Counter::count";
$doc-> getPageDimensions( 2 );
cmp_ok $Counter::count, '<', $fsize,
"for info 2, cumulative read is OK: $Counter::count";
$doc-> cleansave;
cmp_ok $Counter::count, '>=', $fsize,
"for all info, cumulative read is OK: $Counter::count";
done_testing;
__END__
ok 1 - file reading OK with CAM::PDF
ok 2 - had to read 621710 bytes of total 621710
ok 3 - for info 1, cumulative read is OK: 621710
ok 4 - for info 2, cumulative read is OK: 621710
ok 5 - for all info, cumulative read is OK: 621710
ok 6 - file reading OK with PDF::Manip (no slurping)
ok 7 - had to read 3487 bytes of total 621710
ok 8 - for info 1, cumulative read is OK: 3593
ok 9 - for info 2, cumulative read is OK: 3702
ok 10 - for all info, cumulative read is OK: 623916
1..10
| [reply] [d/l] [select] |