jimmygoogle has asked for the wisdom of the Perl Monks concerning the following question:

I have a module with many many packages in it. Its becoming problematic to work in git with conflicts and such. When I split it up into a file per class I get 85 new files. After doing some testing I am seeing that the code calling the split version of my code is now slower than when it called the old the monolithic package. My question is why could this be? Even splitting the code into 10 files shows no performance improvements. Is this best left as is for performance? Is there maybe a middle ground? Any input here is appreciated. Here is what my code looks like:
package Foo; package A our @ISA = qw{Foo}; package B our @ISA = qw{Foo}; package C our @ISA = qw{Foo}; package Bar; package D our @ISA = qw{Bar} package E our @ISA = qw{Bar} package F our @ISA = qw{Bar} ...

Replies are listed 'Best First'.
Re: Splitting large module
by hippo (Archbishop) on Jul 11, 2018 at 08:21 UTC
    Is there maybe a middle ground?

    You could use conditional loading. This will benefit your runtimes in cases where the code to be executed only requires a small number out of the 85 packages which you have, ie. it saves time by not loading maybe 80 files that it won't ever use. There are a number of ways to achieve this and the one which will be most effective for you depends on what your entire codebase does, how scripts will use the modules and how modules will use each other. What you've shown in your stylised example is a 2-tier hierarchy. Is it the same the whole way through?

Re: Splitting large module
by Corion (Patriarch) on Jul 11, 2018 at 07:28 UTC

    I would guess the slowdown simply happens because of the increased disk IO due to opening and reading 85 files instead of a single file.

    How large in absolute terms is the performance difference? If it really is large, maybe you need to choose a different disk layout to make loading the files faster? If your script is just running as a short-lived process, maybe you can change that to a longer-lived process by passing in more work to process in one go?

      I think that all of the USE directives will be resolved at compile time and that this would not account for the perceived difference in performance.

        There is no real distinction in Perl between "compile time" and "runtime", as Perl needs to load all files each time a program is run.

        I can very well imagine a high-latency (or flakey) network connection (maybe NFS) that makes loading files quite slow. This would slow down program startup at least linearly for each file that needs to be opened and read.

Re: Splitting large module
by kcott (Archbishop) on Jul 11, 2018 at 07:48 UTC

    G'day jimmygoogle,

    Welcome to the Monastery.

    We really need to see your split code as well as your unsplit code to make a comparison.

    I could suggest looking at the parent pragma:

    package A; ... use parent 'Foo'; ...

    But, of course, you may be using that already.

    Here's some things you could do to improve your post and get better help from us:

    — Ken

      Pardon my noob-ness. Let me add some more details and try to answer some questions. The code is running as a mod_perl app so I would think all of this code would be loaded when the server is started up. I do need all of them loaded since I dont know when they might be used. Unfortunately use of use parent isnt that easy in my ecosystem. We have development environments on 5.8 still and other environments on 5.20.2. That wrinkle aside, I am stuck with this hardware and configuration, I need to make due with what I am given.

      So the unsplit file looks a little more like this.

      package ThisIsMyPackge; package Foo; package A our @ISA = qw{Foo}; sub validate { do something; } package B our @ISA = qw{Foo}; sub validate { do something; } package C our @ISA = qw{Foo}; sub validate { do something; } package Bar; package D our @ISA = qw{Bar} sub apply { do something; } package E our @ISA = qw{Bar} sub apply { do something; } package F our @ISA = qw{Bar} sub apply { do something; }

      And the split version now looks like this:

      package ThisIsMyPackge; use A; use B; use C; use D; use E; use F; use 79 more times ... .....

      I didnt use Benchmark, I used Time::HiRes to calculate the time it takes to run the foreach loop (modified) below. The timings are based on average of 5 runs through the code. I have done more runs but the results dont differ so I use 5 for my calculations. Note this foreach loop isnt called in ThisIsMyPackge.pm, its called from another file.

      unsplit: .14858s

      split: .4153s

      foreach my $bar (@{$objects}) { my $foo = $bar->object; .... next unless $foo->validate; $bar->apply; .... }

      I put it in other timings to isolate the bottleneck and the slow down is around the $foo->validate and $bar->apply calls. All of the other logic in the loop has its timings almost exactly the same. The results from Devel::NYTProf also point to this area of the code as well. This might not seem like a lot of time but over the course of thousands of concurrent hits, it is a pretty big decrease in performance.

        "Pardon my noob-ness."

        No need to apologise. That was your first post.

        "The code is running as a mod_perl app ..."

        It's been over a decade since I did any serious work with mod_perl; I doubt I'm qualified to give much in the way of advice. I do seem to recall some sort of pre-load (or maybe pre-fetch, or something like that) feature: perhaps look into that.

        "I do need all of them loaded since I dont know when they might be used."

        Does that also equate to "don't know if they might be used"? You might consider loading modules on demand. ++hippo discussed this. Loading a core set of modules initially, then others only when needed, would at least spread the load time, even if you do eventually want all of them.

        "We have development environments ... I need to make due with what I am given."

        Yes, I understand that; I've been in that situation myself. If you do end up sticking with your current unsplit setup, I'd just recommend fully documenting what you have and making that documentation obvious. There's certainly been times when I've looked for "lib/X/Y/Z.pm" and, after much frustrated searching, found "package X::Y::Z;" in "lib/M/N/O/P.pm".

        — Ken

        I need to make due with what I am given.

        It's "make do".

Re: Splitting large module
by tobyink (Canon) on Jul 14, 2018 at 09:16 UTC

    When my module Types::Standard grew to over 2000 lines, I decided to split out some of the bigger method definitions into separate files. I replaced them in the main module with stub subs which when called, load the file where the real sub is called, grab the real function and replace the stub sub.

    This has reduced the main module to about 870 lines, and I'm considering splitting out even more subs to reduce it further. It makes loading the main module measurably faster at the cost of slowing down the first calls of the functions which have been split out (they'll run at normal speed thereafter).

    UPDATE: since writing the above, I've gotten it down to about 680 lines.

      So I figured it out, I was given a program that would split the file for me (written by someone else). The file I was splitting was ~13K lines and there must have been a bug in the code. I split the file by hand and I have no issues now.