Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by hippo (Archbishop) on Sep 27, 2024 at 22:37 UTC
Thanks for posting. You've clearly put some thought into this, which is a really good sign. FWIW, here is my take on your proposal(s). Even without other considerations, I'd really like to move to "PDF" as root in namespace. No reason not to, given that the root already exists. If it were me I would not be doing the whole `PDF::CAMPDF::` thing but would probably just settle for the shorter `PDF::CAM::`. A second "PDF" in there just seems redundant - although others might disagree. No, I didn't try to ask Chris. I think it's only polite to ask. Especially if you make it clear that even not replying is perfectly fine. If an objection comes back then you can abide by that but if the reply is "Actually, under CAM::PDF is the perfect home for it" then you're done and dusted. On the other hand, "Extended" is kind of dull and about nothing. This is the part where perhaps a positive change might be made. "Extended" hints at some extra functionality but not what that is or how it relates to the parent module. Have you just introduced more subs or have you overwritten existing ones? Is there something which ties all this new code together? Can you sum up in one sentence why any given programmer would use the new module over the existing one (or vice versa)? Perhaps then we can all have a think and come up with something more descriptive (or as you say, at least fun). 🦛	[reply] [d/l] [select]
Re^2: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Sep 28, 2024 at 12:45 UTC
Thanks for advice, you are right about being polite, I'll write to Chris shortly. Can you sum up in one sentence... No new functions yet; but smaller final pdf file size because of modern compression; some bugs fixed (no doubt, some added); faster on opening and parsing (and writing), sometimes significantly; somewhat less of a RAM hog. Long version: The `CAM/PDF.pm` contains 133 subroutines; 77 entries are enumerated in API section of the POD (though some of them use wildcards to designate several), and there are 104 methods further described in detail, some marked strictly internal. Plus there are several supporting modules in distribution with quite a few of their own subs. My module, close to 100% complete, has 37 subroutines. Most of them are replacement of originals, either complete re-write or just touched, and some are internal helper subs. No new functions for the end user, actually, at this stage. Added behaviour is triggered using options passed to constructor. Or better let's start with this: I'm planning it to be backward compatible. Whenever someone does: `use CAM::PDF; my $pdf = CAM::PDF->new( $fn_or_stdin_or_pdf, @other_args ); # ... things happen with $pdf` [download] they substitute for my module name and their code works as previously. New features are opt-in, with documented exception. On the external, one such feature is ability to save to 1.5+ version with compressed cross-reference stream and object streams -- thus, smaller file size. No CPAN module can do that, to my knowledge. (Rather, `PDF::API2` can append xref stream on incremental update; but it can only update, not save "cleanly" and optimize; file size can only grow. Plus, it's funny to see a screenful or two of your code in "alien" environment, because the patch was written by yours truly.) Slightly less external option is "no slurping on open", but it only matters for really huge input with high resolution images. Typical office or business pdf file size is dwarfed by what `CAM::PDF` consumes itself. Speaking of which, I changed some of its internal accounting and procedures to improve performance.	[reply] [d/l] [select]
Re^3: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Sep 28, 2024 at 17:15 UTC
Since it's mostly internal and no updated interface and you plan to make it backwards compatible (i.e. any changed behavior is opt-in and no change in prereqs) why not ask permission to take over as the maintainer of the original distribution?	[reply]
Re^4: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Danny (Chaplain) on Sep 28, 2024 at 17:55 UTC
Re^3: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by sleet (Monk) on Oct 30, 2024 at 11:09 UTC
one such feature is ability to save to 1.5+ version with compressed cross-reference stream and object streams -- thus, smaller file size Hopefully this will be a user configurable option. There are still many systems that only work with the older versions.	[reply]
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by stevieb (Canon) on Nov 05, 2024 at 07:54 UTC
I am very impressed with the significant amount of due diligence you've done here trying to sort this out. It reminds me of me in my early days. You are exceptionally articulate, detailed (yet quite concise), big-picture/long-term thinking and respective of prior art. This is very impressive. My gut says that with all the thought that went into just figuring out if you should extend a current namespace, you should come up with your own (under "PDF"), that doesn't include any acronyms that aren't particular to your distribution. I also think that you should request MAINT access to the existing distribution; not necessarily for updating it, but perhaps to update the README so that it claims the old software as either A: deprecated (because your software is backward compatible) or B: antiquated (because your software does most of what the older one does, but maybe not everything), either/or with a link to your distribution. I stood on my head for my first couple of CPAN releases, waiting for feedback from others on how I should proceed. I was scared shitless. I was afraid of infringing on other's distribution names, their work, and perhaps mostly, breaching the conduct of creating a new CPAN name for my work. Don't overthink it. You've worked overtime to ensure you're within the guidelines, and you've gotten wonderful feedback from many people. Just do it. Get your distribution up there in what makes you feel best, and be very proud to be an official open source Perl CPAN contributor! -stevieb	[reply]
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Oct 28, 2024 at 09:51 UTC
Dear Monks, don't know if I should continue here or ask a separate SOPW question, especially if it's not strictly about Perl. Should I completely drop support for (very obsolete and perhaps buggy) implementation of PDF Encryption in `CAM::PDF`, or continue to drag it along into my sub-class? Will anyone miss it, or should I restrict i.e. isolate this "encryption" from my new features, emitting warnings now and then "don't use it here, don't do that, etc."? `CAM::PDF` claims to support 40-bit RC4 encryption to read-write, and 128-bit non-AES to read only. Nothing more modern. In my tests, it does open encrypted files of both kinds but only if owner and user passwords were the same; and for the 40-bit variant, it can also open a file with different O/U pair of passwords, but succeeds only if U/U pair of strings is provided to constructor. Weird. When `CAM::PDF` is used to encrypt non-encrypted PDF files, for P/P (i.e. same O/U) passwords, the produced file is opened OK by Adobe Reader (or modern browsers) after requesting a password. For O/U (different pair), Reader accepts either "U" or "O" password but displays a blank page. Actually, the Reader always accepts either of the two, if another authoring app was used to encrypt (and displays a file correctly), so the comment in line 2951 is moot. "Perhaps a bug", indeed. I can successfully run `CAM::PDF`'s test suite, with "`CAM::PDF`" replaced with "`PDF::CAMPDF::X`" in test files. Some of the tests are about setting PDF permissions/passwords, then saving and re-reading scalars. But I'm unhappy because of compromises I had to make to keep functional what I consider obsolete and buggy, and not interested in. Also, my idea was to inherit the test suite (at least, for now); if "encryption/permissions" part is stricken out, it gets somewhat slim. Personally, PDF encryption is source of confusion/irritation at best and misery at worst; and absolute taboo in my working life (pre-press). But I don't know, perhaps it's better not to be too hasty (throwing "encryption" away)? Is PDF encryption, modern or not, actually widely used in, eh-m, the "first world"? Considering the effort obviously put into development by CAM? Or is it a niche thing no one will miss?	[reply] [d/l] [select]
Re^2: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by hippo (Archbishop) on Oct 28, 2024 at 11:07 UTC
Is PDF encryption, modern or not, actually widely used IME, no. I suspect that this is largely because it suffers from the requirement to share the password with those others you wish to be able to read the document. And if you are going to share the password securely, why not just share the document securely? When I do have a need to share a PDF securely with someone I just encrypt it with their PGP public key. 🦛	[reply]
Re^3: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by ysth (Canon) on Oct 28, 2024 at 15:11 UTC
I don't know that there are good reasons to encrypt PDFs, but it does still happen. I once received an encrypted PDF in response to a FOIA request, and once was emailed a medical bill as an encrypted PDF. -- A math joke: r = \| \|csc(θ)\|+\|sec(θ)\| \|-\| \|csc(θ)\|-\|sec(θ)\| \|	[reply]
Re^4: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by hippo (Archbishop) on Oct 28, 2024 at 15:21 UTC
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Nov 13, 2024 at 10:12 UTC
Thanks, everyone, for advice and encouragement. Eh-m, back to ground zero (been asked to drop the acronym), is `PDF::Manipulate` (rather, `PDF::Manipulate::X`) an OK name? (Contra: Verb as 2nd term? Too long? Negative connotations?) I considered "Parser", or "Wielder", or "ReadWrite", but like them even less. "Manipulate" is in the very 1st sentence on `CAM::PDF` pod page. In essence, I consider (and use) `CAM::PDF` as low level parser and manipulator with few high level methods, a tool for other developers to make (high level) tools. Why "X"? `PDF::CAMPDF::X` would have eXtended `CAM::PDF` to serve as drop-in replacement in any existing code. `PDF::CAMPDF::X2` would, in turn, subclass it but get rid of `CAM::PDF::Node` (blessed-for-no-reason constant-keys-set hashrefs) and use arrayrefs for "nodes". An exploration at first, how much of a performance boost that would add. Would be advised for new code, can't serve as drop-in `CAM::PDF` replacement. Because see e.g. line#82, and many more, in `CAM::PDF::Annot` accessing "`{value}`"; poking inside "nodes" is all over any code using `CAM:PDF`. So then the idea now is to have `PDF::Manipulate::X` and `PDF::Manipulate::X2` in the same distribution and kind of on "same depth level". Unless, that is, strongly advised not to. I'll think about how to properly and carefully document everything.	[reply] [d/l] [select]
Re^2: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by hippo (Archbishop) on Nov 13, 2024 at 10:28 UTC
is PDF::Manipulate (rather, PDF::Manipulate::X) an OK name? (Contra: Verb as 2nd term? Too long? Negative connotations?) "Manipulate" seems OK to me. If you think it is overly long then "Manip" is also fine (See Date::Manip, Bit::Manip, et al). I considered "Parser", or "Wielder", or "ReadWrite", but like them even less. Simply, "Edit"? Honestly, it's the ::X and ::X2 which are slightly more confusing anyway. I would probably be more inclined to go with `PDF::Foo::Compat` for the drop-in replacement one and just `PDF::Foo` for the forward-looking one (where Foo is one of Manipulate/Manip/Edit/...). How does that sound? 🦛	[reply] [d/l] [select]
Re^3: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Nov 13, 2024 at 17:04 UTC
Thank you very much, hippo, I like `PDF::Manip` a lot. Somehow modules you mentioned slipped my mind, and metacpan's search for "Manipulate" didn't show them in the 1st hundred. Very respectful company for my distribution, then :-). `Edit` is not so apt, because "editing" may falsely imply the ability to e.g. "search for word 'Perl' and make its font bold-faced". I also see good reasons/logic for having `Compat` in module's name. Originally I intended (restoring for a moment the previous name-set) to have `PDF::CAMPDF::X` in Synopsis, etc., and only later/gradually advise to use `X2` in new code. For usual/small PDF files, `X2` is not much of an advantage, and less confusion to learn the difference. But now I see perhaps you are right, I owe fair amount of explaining to would-be users anyway.	[reply] [d/l] [select]
Re^2: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by sleet (Monk) on Nov 13, 2024 at 14:07 UTC
Consider ::Manip over ::Manipulate, since it's more common among other CPAN modules.	[reply]
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Oct 30, 2024 at 10:40 UTC
A question about Perl this time. Is following an OK solution (and backward compatible to e.g. 5.010) to write numeric values as they were parsed, be it either 3.14 or 3.14159, i.e. not to force rounding to arbitrarily "sane" length? Results of numeric calculations still have to be formatted. The emergency test for "e" is for values from whatever sources whose past allows Perl to treat them as strings. Scientific notation is syntax error. `CAM::PDF` does just `"$val"`, I don't think it's good enough. `do { use B; if ( not B::SVf_POK & B::svref_2object( \$val )->FLAGS or -1 != index $val, 'e' ) { $val = sprintf '%.4f', $val; $val =~ s/\.?0+$// } "$val" }` [download]	[reply] [d/l] [select]
Re^2: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by hippo (Archbishop) on Oct 30, 2024 at 11:24 UTC
This approach fails for your given example input of 3.14159: `use strict; use warnings; use Test::More tests => 1; my $val = 3.14159; my $oldval = $val; my $newval = do { use B; if ( not B::SVf_POK & B::svref_2object( \$val )->FLAGS or -1 != index $val, 'e' ) { $val = sprintf '%.4f', $val; $val =~ s/\.?0+$// } "$val" }; is "$oldval", "$newval";` [download] I think you have to either "round to an arbitrarily sane length" as your code does or else just go with the flow. 🦛	[reply] [d/l]
Re^3: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Oct 30, 2024 at 12:06 UTC
Sorry, I wasn't clear. "Numeric" values are result of parsing, but are kept internally as strings (directly from capture buffers), and are treated as numbers only if user performs any calculation. No encapsulation is enforced by `CAM::PDF`, and some of these values can be replaced/assigned (or new ones created) by user, either as, hopefully, actual numbers, but also perhaps they can arrive as "numeric" strings. The idea was for original values to be "passed through" unmodified when writing new file or stringifying content. Ugly strings such as "3.141592653589793" if supplied by user will sneak into output, but I guess let it be, it's syntactically OK. Too much hassle otherwise, to really keep track of what has come from parsing source, and what was added by user.	[reply] [d/l]
Re^2: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Nov 04, 2024 at 23:18 UTC
Another question (hopefully they aren't regarded "do the research for me" or "write code for me"). W.r.t. output, is adding a dependency on `File::Copy` for trivial task a bad idea? Is "append mode" considered undesirable perhaps in file systems I have no experience with? Originally, a file is always slurped in, content scalar either appended to or re-generated from scratch for incremental or clean save respectively; then output file is opened in create mode and scalar is printed. In worst case (minute incremental update to a very large file using the same file name), almost exact huge copy is written anew. I don't like this design and re-wrote this part. Incrementally updating to the same file name opens a file in append mode, then as large or small as required data is written. Incrementally updating to a different file name uses `File::Copy::copy()` before that. Especially useful if original file wasn't slurped-in. I could do without `File::Copy`, performing instead read/print loop through some small buffer (16 KB or what's OK value these days), but why shouldn't I use dedicated/optimised module? Only not to add a dependency? On core module? As to "no appending" in original `CAM::PDF`, I can envision a case when file is modified by someone between my read and write. Then corrupt PDF would be saved. But it's no worse than my erasing "someone"'s work by opening a file in create mode. Both scenarios are somewhat improbable for PDFs on disk.	[reply] [d/l] [select]
Re^3: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by sleet (Monk) on Nov 05, 2024 at 01:26 UTC
is adding a dependency on File::Copy for trivial task a bad idea? Note that it's included in core: `$ corelist File::Copy Data for 2024-10-20 File::Copy was first released with perl 5.002` [download]	[reply] [d/l]
Re^4: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by marto (Cardinal) on Nov 05, 2024 at 07:49 UTC
Re^5: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by stevieb (Canon) on Nov 05, 2024 at 07:59 UTC
Re^4: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by stevieb (Canon) on Nov 05, 2024 at 07:42 UTC
Re: Choosing namespace/name for (my first) CPAN module which is a sub-class of a well-known distribution by Anonymous Monk on Dec 17, 2024 at 00:05 UTC
Back in business after distraction. Tutorials and guides demand tests. Stealing parent's tests doesn't look good (gonna do it anyway), so I'll write something for added features. How to test whether "no slurping" mode is indeed functional? Is what follows an overkill? Are there blunders, can it be improved? Have I overseen something on CPAN? Btw, PerlIO::via docs leave a few things to be desired. PUSHED is said to return "an object or the class". "Class" what? I can only satisfy it with blessed whatever, even as ridiculous as below. READ's arg is said to be "`$buffer`", but then I only succeeded with not very nice direct access to @_ element (won't `Perl::Critic` drop unconscious about it?). And using package as written as e.g. io layer and then reading with diamond operator (not related to my tests) generates random length strings, very weird. Perhaps it could be mentioned that without FILL defined (READ only) the `readline` is unusable? It's strange that a few core and CPAN modules I checked all work through FILL. No one does READ. use strict; use warnings; use feature 'say'; use Test::More; BEGIN { package Counter; use strict; use warnings; our $count; sub PUSHED { $count = 0; return bless \(1 + 1), 1 } sub READ { my ( undef, $buffer, $len, $fh ) = @_; my $bytes = read $fh, $_[1], $len; $count += $bytes; return $bytes } sub BINMODE { 0 } sub SEEK { CORE::seek( $_[3], $_[1], $_[2] ) } } BEGIN { *CORE::GLOBAL::open = sub { splice @_, 1, 1, '<:raw:via(Counter)' if defined $_[1] and $_[1] eq '<'; &CORE::open } }; my $fname = 't/sample1.pdf'; die unless -e $fname; my $fsize = -s _; my $doc; use CAM::PDF; $doc = CAM::PDF-> new( $fname ); ok $doc, "file reading OK with CAM::PDF"; cmp_ok $Counter::count, '==', $fsize, "had to read $Counter::count bytes of total $fsize"; $doc-> getPageDimensions( 1 ); cmp_ok $Counter::count, '==', $fsize, "for info 1, cumulative read is OK: $Counter::count"; $doc-> getPageDimensions( 2 ); cmp_ok $Counter::count, '==', $fsize, "for info 2, cumulative read is OK: $Counter::count"; $doc-> cleansave; cmp_ok $Counter::count, '==', $fsize, "for all info, cumulative read is OK: $Counter::count"; # done_testing; ### you may wish to delete to the end of file for SSCCE, ### PDF::Manip is not released yet use lib 'lib'; use PDF::Manip; # $doc = PDF::Manip-> new( $fname ); # ok $doc, "file reading OK with PDF::Manip"; # cmp_ok $Counter::count, '==', $fsize, # "had to read $Counter::count bytes of total $fsize"; # $doc-> getPageDimensions( 1 ); # cmp_ok $Counter::count, '==', $fsize, # "for info 1, cumulative read is OK: $Counter::count"; # $doc-> getPageDimensions( 2 ); # cmp_ok $Counter::count, '==', $fsize, # "for info 2, cumulative read is OK: $Counter::count"; # $doc-> cleansave; # cmp_ok $Counter::count, '==', $fsize, # "for all info, cumulative read is OK: $Counter::count"; $doc = PDF::Manip-> new( $fname, { slurp => 0 }); ok $doc, "file reading OK with PDF::Manip (no slurping)"; cmp_ok $Counter::count, '<', $fsize, "had to read $Counter::count bytes of total $fsize"; $doc-> getPageDimensions( 1 ); cmp_ok $Counter::count, '<', $fsize, "for info 1, cumulative read is OK: $Counter::count"; $doc-> getPageDimensions( 2 ); cmp_ok $Counter::count, '<', $fsize, "for info 2, cumulative read is OK: $Counter::count"; $doc-> cleansave; cmp_ok $Counter::count, '>=', $fsize, "for all info, cumulative read is OK: $Counter::count"; done_testing; __END__ ok 1 - file reading OK with CAM::PDF ok 2 - had to read 621710 bytes of total 621710 ok 3 - for info 1, cumulative read is OK: 621710 ok 4 - for info 2, cumulative read is OK: 621710 ok 5 - for all info, cumulative read is OK: 621710 ok 6 - file reading OK with PDF::Manip (no slurping) ok 7 - had to read 3487 bytes of total 621710 ok 8 - for info 1, cumulative read is OK: 3593 ok 9 - for info 2, cumulative read is OK: 3702 ok 10 - for all info, cumulative read is OK: 623916 1..10 [download]	[reply] [d/l] [select]