Splitting large module

jimmygoogle has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Splitting large module by hippo (Archbishop) on Jul 11, 2018 at 08:21 UTC
Is there maybe a middle ground? You could use conditional loading. This will benefit your runtimes in cases where the code to be executed only requires a small number out of the 85 packages which you have, ie. it saves time by not loading maybe 80 files that it won't ever use. There are a number of ways to achieve this and the one which will be most effective for you depends on what your entire codebase does, how scripts will use the modules and how modules will use each other. What you've shown in your stylised example is a 2-tier hierarchy. Is it the same the whole way through?	[reply]
Re: Splitting large module by Corion (Patriarch) on Jul 11, 2018 at 07:28 UTC
I would guess the slowdown simply happens because of the increased disk IO due to opening and reading 85 files instead of a single file. How large in absolute terms is the performance difference? If it really is large, maybe you need to choose a different disk layout to make loading the files faster? If your script is just running as a short-lived process, maybe you can change that to a longer-lived process by passing in more work to process in one go?	[reply]
Re^2: Splitting large module by Anonymous Monk on Jul 11, 2018 at 17:13 UTC
I think that all of the USE directives will be resolved at compile time and that this would not account for the perceived difference in performance.	[reply]
Re^3: Splitting large module by Corion (Patriarch) on Jul 11, 2018 at 17:45 UTC
There is no real distinction in Perl between "compile time" and "runtime", as Perl needs to load all files each time a program is run. I can very well imagine a high-latency (or flakey) network connection (maybe NFS) that makes loading files quite slow. This would slow down program startup at least linearly for each file that needs to be opened and read.	[reply]
Re: Splitting large module by kcott (Archbishop) on Jul 11, 2018 at 07:48 UTC
G'day jimmygoogle, Welcome to the Monastery. We really need to see your split code as well as your unsplit code to make a comparison. I could suggest looking at the parent pragma: `package A; ... use parent 'Foo'; ...` [download] But, of course, you may be using that already. Here's some things you could do to improve your post and get better help from us: Take a look at "How do I post a question effectively?" for information about what to post such that we can better help you. You said "After doing some testing ...": let us in on those tests and the results. Have you run a Benchmark? If so, show us. Also look at "Short, Self-Contained, Correct Example": creating an SSCCE for this issue may not be appropriate but there's lots of useful information on that page. — Ken	[reply] [d/l]
Re^2: Splitting large module by jimmygoogle (Initiate) on Jul 11, 2018 at 13:02 UTC
Pardon my noob-ness. Let me add some more details and try to answer some questions. The code is running as a mod_perl app so I would think all of this code would be loaded when the server is started up. I do need all of them loaded since I dont know when they might be used. Unfortunately use of `use parent` isnt that easy in my ecosystem. We have development environments on 5.8 still and other environments on 5.20.2. That wrinkle aside, I am stuck with this hardware and configuration, I need to make due with what I am given. So the unsplit file looks a little more like this. `package ThisIsMyPackge; package Foo; package A our @ISA = qw{Foo}; sub validate { do something; } package B our @ISA = qw{Foo}; sub validate { do something; } package C our @ISA = qw{Foo}; sub validate { do something; } package Bar; package D our @ISA = qw{Bar} sub apply { do something; } package E our @ISA = qw{Bar} sub apply { do something; } package F our @ISA = qw{Bar} sub apply { do something; }` [download] And the split version now looks like this: `package ThisIsMyPackge; use A; use B; use C; use D; use E; use F; use 79 more times ... .....` [download] I didnt use Benchmark, I used Time::HiRes to calculate the time it takes to run the foreach loop (modified) below. The timings are based on average of 5 runs through the code. I have done more runs but the results dont differ so I use 5 for my calculations. Note this foreach loop isnt called in ThisIsMyPackge.pm, its called from another file. unsplit: .14858s split: .4153s `foreach my $bar (@{$objects}) { my $foo = $bar->object; .... next unless $foo->validate; $bar->apply; .... }` [download] I put it in other timings to isolate the bottleneck and the slow down is around the `$foo->validate` and `$bar->apply` calls. All of the other logic in the loop has its timings almost exactly the same. The results from Devel::NYTProf also point to this area of the code as well. This might not seem like a lot of time but over the course of thousands of concurrent hits, it is a pretty big decrease in performance.	[reply] [d/l] [select]
Re^3: Splitting large module by kcott (Archbishop) on Jul 12, 2018 at 09:54 UTC
"Pardon my noob-ness." No need to apologise. That was your first post. "The code is running as a mod_perl app ..." It's been over a decade since I did any serious work with mod_perl; I doubt I'm qualified to give much in the way of advice. I do seem to recall some sort of `pre-load` (or maybe `pre-fetch`, or something like that) feature: perhaps look into that. "I do need all of them loaded since I dont know when they might be used." Does that also equate to "don't know if they might be used"? You might consider loading modules on demand. ++hippo discussed this. Loading a core set of modules initially, then others only when needed, would at least spread the load time, even if you do eventually want all of them. "We have development environments ... I need to make due with what I am given." Yes, I understand that; I've been in that situation myself. If you do end up sticking with your current unsplit setup, I'd just recommend fully documenting what you have and making that documentation obvious. There's certainly been times when I've looked for "`lib/X/Y/Z.pm`" and, after much frustrated searching, found "`package X::Y::Z;`" in "`lib/M/N/O/P.pm`". — Ken	[reply] [d/l] [select]
Re^4: Splitting large module by jimmygoogle (Initiate) on Jul 13, 2018 at 16:12 UTC
Re^3: Splitting large module by tobyink (Canon) on Jul 14, 2018 at 09:06 UTC
I need to make due with what I am given. It's "make do". toby döt ink	[reply]
Re^4: Splitting large module by haukex (Archbishop) on Jul 14, 2018 at 09:15 UTC
Re: Splitting large module by tobyink (Canon) on Jul 14, 2018 at 09:16 UTC
When my module Types::Standard grew to over 2000 lines, I decided to split out some of the bigger method definitions into separate files. I replaced them in the main module with stub subs which when called, load the file where the real sub is called, grab the real function and replace the stub sub. This has reduced the main module to about 870 lines, and I'm considering splitting out even more subs to reduce it further. It makes loading the main module measurably faster at the cost of slowing down the first calls of the functions which have been split out (they'll run at normal speed thereafter). UPDATE: since writing the above, I've gotten it down to about 680 lines. toby döt ink	[reply]
Re^2: Splitting large module by jimmygoogle (Initiate) on Jul 19, 2018 at 16:59 UTC
So I figured it out, I was given a program that would split the file for me (written by someone else). The file I was splitting was ~13K lines and there must have been a bug in the code. I split the file by hand and I have no issues now.	[reply]