Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Splitting program into modules

by lis128 (Acolyte)
on Nov 10, 2018 at 22:14 UTC ( [id://1225546]=perlquestion: print w/replies, xml ) Need Help??

lis128 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

at my job i am last one who understands and worships spiritual Perl power.
Because of that i was sent to dungeon full of hash pounds and dollar signs, named Connetor.pl, to alter it's way of working.

I was given 14k lines of wise, full of vaild and even preety, but not self-documenting Perl code. sub definitions are mixed with "main" code and sub calls, database routine is ending just to call curl on the main "thread" and after which there's another sub defined.
Obviously it's not helping to understand what, or how, this code is relly doing, so i decided to split it into functional packages

So, as i did not found any more elegant way to include "reusable" code, i decided to group subs accessing database into Database.pm, these interacting with API for data input ended in API.pm and so on, leaving main to just call predeclared subs and decide either to INSERT them into databse or print.

Main package had been shrinked into circa 300 lines and i've gained much visibility. When i wanted to proceed to unit tests and documenting every sub functionality (like: this function CONSUMES scalar with URL, PRODUCES array with "img scr" tags) i found that my packaging solution might not be wisest thing done there.
Of course i didn't foreseen that namespaces can be an issue here, and they were

main calls custom wrapper to eventually create instance of DBI object and holds its ref in $main::sql.
sub sql_connect embedded into Database.pm (Databae::sql_connect to be precise) tries to call "connect" method on $sql, but API.pm's methods albo uses some $sql methods
and there's alot of shared variables like this.
Before my modularization attempt everything worked, now i am forced to replace all "my"s into "our"s definitions in main in order to grant access to these variables by modules.

also changing all $variable in modules to $main::variable syntax and constanlty growing out @EXPORT = qw (...); gave me that strage feeling like trying to leave dungeon leads me to catacombs.

what am i missing here? How do properly split this code into logical chunks of separate files, but keeping namespace "main"?

ANY ideas will be appreciated.
my main goal is to document code, understand it's flow and based on that create another functionality

Replies are listed 'Best First'.
Re: Splitting program into modules
by eyepopslikeamosquito (Archbishop) on Nov 11, 2018 at 05:11 UTC

      I must say that feedback overgrown my expectations.
      Thank you all for humongous repository of things to read- i really appreciate that
      I just wanted to say that i did not abandoned topic and will try to dig through your advices and links.

      In meantime, i've managed to isolate similiar functionalities without switching namespaces. I just got rid of package statements and usage of Exporter module, but this leads to another problem
      As i wrote earlier i am using my own simple debugging routines (yes i know it can be done better, but i am developing these modules giving them required functionality). Let's code speak for himself

      my $debug = $ENV{'dbg'}; sub debugInfo { my $iWasAt= ( caller(1) )[3] || "main"; my $lineWhereCalled= ( caller(0) )[2] || ( caller(1) )[2]; print STDERR ("\033[1;31m\t$iWasAt\033[0m\@\033[1;32m$lineWhereCal +led:\033[0m \t\t@_\n") if ($debug); }

      Until now everything went fine, i called debugInfo("entry: @_"); and i received package name with corresponding line where call was made, like
      main@139: wchodze w loopa, iteracja:4 Database::sql_connect@144: entry: API::base@13: entry: config

      Now, my simple use'ing packages not being packages makes my $iWasAt always being main, also lines are relative to module file line number.
      So i am looking for another solution, but i feel that with you hackers, nothing's impossible :)
      Going do read thoroughly through your posts, thanks angain

        As I already said you should start by splitting your 14000 lines into multiple files and require or do them, no need to switch packages at the first step.

        (Careful about filescoped private variables)

        Since caller will also tell you the filename, your debug routine can be more explicit then.

        Btw: Using the trace option of the debugger might be another option.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        I'm a big fan of the Log::Log4perl module for logging. For me, the terrific feature of this module is that you can adjust the level of messages you get in your log file -- dial it up to DEBUG to get everything, or back down to WARN for just warnings. In between the two is INFO, containing useful messages about what my scripts are doing.

        If you add log messages to the various modules that you are developing, you'll be able to track in what orders things are happening. It's really illuminating to see this stuff scroll by -- I have status screens that watch the tail end of various log files during production hours so I can stay on top of how my system is behaving.

        Good luck -- let us know how it all turns out.

        Alex / talexb / Toronto

        Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Splitting program into modules
by LanX (Saint) on Nov 10, 2018 at 23:38 UTC
    Well ... some general thoughts on splitting up unknown code

    Incremental strategy
    • create your test suite first
    • do little steps, and always test the result
    • use a revision control system like git
    • commit every change
    • use branches for experiments
    • once you found a previously untested bug expand your test suite and roll back

    Prerequisites

    Study and understand

    ==== Perl essentials

    • strict and warnings
    • my vs our variables,
    • namespaces, scoping, blocks
    • exporter
    • constants
    • warn and die

    ==== Tools and techniques

    ==== your application

    • your data model ( database, configs)
    • input and output
    • avaliable doc
    • stakeholders (users, contributers, maintainers) to ask

    Analyze
    • analyze the dependencies of your subs (x calls y calls z)
    • analyze the shared variables
    • try to visualize the dependencies in a graph
    • the hierarchy should help identifying logical units (aka modules)
    • use tools to help you analyze like B::Xref
    • other tools like mentioned here: Searching for duplication in legacy code
    • identify dead code (never called subs, out-commented trash)
    Documentation
    • once you understood a mechanism, write it down
    • use Pod headers when possible
    • normalize (beautify) your code with Perl::Tidy
    • review your pod2html from time to time and fill gaps

    Modularisation / Namespaces ?
    • bundle subs into modules by functionality not technology (not all sql in one module, eg look at the TABLE names )
    • require into the same namespace might be an easier intermediate step before learning to use exporter °
    • modules normally require namespaces (packages)
    • package variables in other modules need to be fully qualified when used outside $Pkg::var °
    • same for Pkg::subs() °
    • modules allow to export vars and subs when use d
    • modules allow to pass and init shared variables when using import (like a database handle)

    Object Oriented Programming

    • logical modules are sometimes better OO classes
    • check guides on "when OOP is better"
    • indicator: if you have to pass around same arguments
    • indicator: group of subs access always same global vars
    • indicator: init() routines for globals are sometimes better ->new()
    • easier to construct an object with encapsulated instance vars and class vars
    • have a look at Moo before doing old style OOP
    Improvement
    • make your code more fault tolerant
    • add argument checking to your subs
    • rewrite many positional args into named args
    • condense duplicated code into new functions
    • limit the scope of vars and subs if possible
    • give identifiers like variables meaningful names
    • document your strategy for future maintainers

    See Also

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    update

    Expanded the OOP part

    °) that's an intermediate step exporting and importing is much cleaner

    ²) As a rule of thumb from easier to better

    • require files same namespace (NB: my-lexicals can't be shared)
    • try to group connected subs into namespaces with package
    • use module different namespaces => full qualified identifiers for shared data
    • use module different namespaces => exporting and importing shared data
    • use oo-class : shared data in instance/class-vars and methods, transport via constructor and setter/getters

      Object Oriented Programming: logical modules are sometimes better OO classes ...

      As for whether and when to use OO, my simple rule of thumb is to ask "do I need more than one?": if the answer is yes, an object is indicated; if the answer is no, a module.

      A (non Perl-specific) design checklist (derived from On Coding Standards and Code Reviews):

      • Coupling and Cohesion. Systems should be designed as a set of cohesive modules as loosely coupled as is reasonably feasible.
      • Testability. Systems should be designed so that components can be easily tested in isolation.
      • Data hiding. Minimize the exposure of implementation details. Minimize global data.
      • Interfaces matter. Once an interface becomes widely used, changing it becomes practically impossible (just about anything else can be fixed in a later release).
      • Design the module's interface first.
      • Design interfaces that are: consistent; easy to use correctly; hard to use incorrectly; easy to read, maintain and extend; clearly documented; appropriate to your audience. Be sufficient, not complete; it is easier to add a new feature than to remove a mis-feature.
      • Use descriptive, explanatory, consistent and regular names.
      • Correctness, simplicity and clarity come first. Avoid unnecessary cleverness. If you must rely on cleverness, encapsulate and comment it.
      • DRY (Don't repeat yourself).
      • Establish a rational error handling policy and follow it strictly.

        I once had to maintain code which had many subs accessing a bunch of global states which where switched by calling an "init()" routine or passed flags.

        After long analysis (Freudian yes) I realized that these routines where effectively methods, the states where instance vars and the so called init() routine switched the instances.

        Well actually that was only a simplified description of what happened, I don't wanna give you nightmares. :)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        </div
Re: Splitting program into modules
by karlgoethebier (Abbot) on Nov 11, 2018 at 19:33 UTC

    I don‘t know how you count. Perhaps it‘s not so much code if you skip the shebangs, pragmas and all the blanks? And you could consider to use Class::Tiny and Role::Tiny to organize your code? Please use SuperSearch to find some examples i‘ve provided in the past. And sorry, i‘m on my IPad and copying the links is pain in the ass 😕 on this device. Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Splitting program into modules
by harangzsolt33 (Chaplain) on Nov 12, 2018 at 23:52 UTC
    Is there a program that reads a Perl source code and prints out the names of all the subs declared in that file?

    Perhaps an even more useful program might also show the dependencies, so at a quick glance you could see all the subs and which one depends on which one. If they are called sub a3e {} then, of course, that won't reveal much. But if they are called "calc_offset" or "getTimeZone" or something that is self-explanatory, then such a program would help a lot in breaking down this huge code into comprehensible chunks.

    Is there such a program?

      My Devel::Examine::Subs can list subs within files.

      Single file example:

      use warnings; use strict; use Devel::Examine::Subs; my $des = Devel::Examine::Subs->new(file => 'lib/Devel/Examine/Subs.pm +'); my $subs = $des->all; print "$_\n" for @$subs;

      Output:

      BEGIN new all has missing lines module objects search_replace replace inject_after inject remove order backup add_functionality engines pre_procs post_procs run valid_params _cache _cache_enabled _cache_safe _clean_config _clean_core_config _config _file _params _read_file _run_directory _run_end _write_file _core _pre_proc _proc _post_proc _engine _pod

      You can also do entire directory structures:

      use warnings; use strict; use Devel::Examine::Subs; my $des = Devel::Examine::Subs->new(file => '.'); my $data = $des->all; for my $file (keys %$data){ print "$file:\n"; for my $sub (@{ $data->{$file} }){ print "\t$sub\n"; } }

      Snipped example output:

      t/test/files/sample.pm: one one_inner one_inner_two two three four function five six seven eight examples/write_new_engine.pl: dumps lib/Devel/Examine/Subs/Sub.pm: BEGIN new name start end line_count lines code lib/Devel/Examine/Subs/Preprocessor.pm: BEGIN new _dt exists module inject replace remove _vim_placeholder

      The software does a ton of useful things, but these are examples of the most basic functionality. It does not know how to see sub dependencies of other subs. However, I do have another software that does, however, it is intrusive (it actually writes into the Perl files, and you have to run the software to get usable trace information (ie. if you don't call all scenarios, it may not find all flows). I don't have the time at the moment to write a proper scenario for that, but have a look at Devel::Trace::Subs if you're interested. If you don't come up with anything else by morning, I'll create a good example.

        So I've put together a very basic display of how the Devel::Trace::Subs works. Again, it's intrusive; it actually writes into the files you want to capture tracing info from (I wrote this software that another piece of software required, primarily out of sheer curiosity).

        Here's the original Perl file we're working with (./test.pl):

        use warnings; use strict; three(5); sub three { return two(shift); } sub two { return one(_helper(shift)); } sub one { my $num = calc(shift); display($num); } sub calc { my $num = shift; return $num ** 3; } sub display { my $num = shift; print "$num\n"; } sub _helper { my $num = shift; return ++$num; }

        When run, it produces this output:

        216

        Very basic. Now, install Devel::Trace::Subs, and from the command line, tell it to become traceable:

        perl -MDevel::Trace::Subs=install_trace -e 'install_trace(file => "test.pl")'

        ...now the test.pl file looks like this:

        use warnings; use Devel::Trace::Subs qw(trace trace_dump); # injected by Devel::Trac +e::Subs use strict; three(5); sub three { trace() if $ENV{DTS_ENABLE}; # injected by Devel::Trace::Subs return two(shift); } sub two { trace() if $ENV{DTS_ENABLE}; # injected by Devel::Trace::Subs return one(_helper(shift)); } sub one { trace() if $ENV{DTS_ENABLE}; # injected by Devel::Trace::Subs my $num = calc(shift); display($num); } sub calc { trace() if $ENV{DTS_ENABLE}; # injected by Devel::Trace::Subs my $num = shift; return $num ** 3; } sub display { trace() if $ENV{DTS_ENABLE}; # injected by Devel::Trace::Subs my $num = shift; print "$num\n"; } sub _helper { trace() if $ENV{DTS_ENABLE}; # injected by Devel::Trace::Subs my $num = shift; return ++$num; }

        I'd like to point out that the design for this software was to be used within modules not normal scripts, but I digress. In order to get the output from the tracing, you have to add a couple of things to your calling script (in this case, it's the original script itself). We'll pretend we're calling modules infected with the trace software here. Add the trace enabling flag, then after all of your calls have been made you want to get the trace info from, call the dump_trace() function::wq

        $ENV{DTS_ENABLE} = 1; three(5); # this is the original call stack you're running trace_dump();

        Now, you get the original output, but you also get the code flow and stack trace information:

        216 Code flow: 1: main::three 2: main::two 3: main::_helper 4: main::one 5: main::calc 6: main::display Stack trace: in: main::three sub: - file: test.pl line: 7 package: main in: main::two sub: main::three file: test.pl line: 13 package: main in: main::_helper sub: main::two file: test.pl line: 17 package: main in: main::one sub: main::two file: test.pl line: 17 package: main in: main::calc sub: main::one file: test.pl line: 21 package: main in: main::display sub: main::one file: test.pl line: 22 package: main

        You can opt via parameters to trace_dump to display just the code flow or the stack trace or both (as is the default as shown above), in text or HTML output formats.

        This is a *very* basic example of how I've used this software. Again, we're using it in a single file here. Normally I'd have a test script using external modules, so the command to return your original code is this:

        perl -MDevel::Trace::Subs=remove_trace -e 'remove_trace(file => "test.pl")'

        ...which returns the script back to default, except for the manual lines (which wouldn't normally be in an original .pl file). Delete these lines manually:

        $ENV{DTS_ENABLE} = 1; trace_dump();

        I'll try to put together a much better example of how I really use it in the coming days.

        > It does not know how to see sub dependencies of other subs.

        From what I can see it also doesn't show dependencies from "outer" variables (globals or closure), right?

        update

        which is relevant in this thread

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Is there a program that reads a Perl source code and prints out the names of all the subs declared in that file?

      Devel::NYTProf generates an HTML report with sortable lists of subs that you can click to view the code with full timing information:

      perl -d:NYTProf script.or.module.pl; nytprofhtml --open
Re: Splitting program into modules
by Anonymous Monk on Nov 12, 2018 at 07:05 UTC
    A contrarian view: 14000 lines of working code is no joke. Why bother refactoring? If you don't understand it, keep trying. To add new functionality simply write new subroutines that follow the conventions of the original code. Fragmenting the code may make it even harder to understand and maintain. Perl makes delicious and potent spaghetti, just add more sauce, and enjoy. K.I.S.S.

      A contrarian view: 14000 lines of working code is no joke. Why bother refactoring?

      Successful software tends to live a long time: bugs are fixed; new features added; new platforms supported; the software adapted to new markets. That is, successful software development is a long term activity. Planning for success means planning for your code to be maintained by a succession of many different programmers over a period of many years. Not planning for that is planning to fail. This is the primary reason for refactoring and continuously keeping the code clean, to make long term code maintenance sustainable.

      Put another way, it's the difference between Programming "Hey, I got it to work!" and Engineering "What happens when code lives a long time?". A quick one-off hack is fine if the code only needs to run a couple of times ... but not if it becomes a long-lived critical feature.

      Programming is easy, Engineering hard. You need to hire programmers with sound technical skills and domain knowledge, enthusiastic, motivated, get things done, keep the code clean, resilient, innovative, team players ... and then motivate them, train them, keep them happy so they don't want to leave, yet have effective handovers when they do ... a hard problem. Yet to be successful that's what you need to do.

      See also: Why Create Coding Standards and Perform Code Reviews?

      > 14000 lines of working code is no joke. Why bother refactoring?

      Such monster are mostly full of bugs because maintenance becomes impossible if you've lost the overview.

      Let's be generous and assume 100 lines of code and clutter per function in average. That'll mean 140 functions...

      ... divide this by 5 or 10 or 15 ...

      > K.I.S.S.

      D.A.C.D. °

      Splitting up into smaller units, included with do or require is pretty safe² ...

      and will add

      • far better overview already.
      • easier POD-Documentation
      • better control over global vars
      • granulated revision control by changing single files instead of a whole bundle
      • easier deployment
      • more efficient testing
      and I haven't even talked yet about the possibilities to improve this code further like described in my first post.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      °) Divide and conquer, Dumbo!

      ²) file scoped lexicals must be in the same file like the functions they access

Re: Splitting program into modules
by harangzsolt33 (Chaplain) on Nov 11, 2018 at 03:50 UTC

    I have written a little sub that includes other files in your perl code. And I think, it's exactly what you need. Just try it and see if it works:

    include('database.pl');

    OR

    my $whatever = include('getTime.pl');

    ...

    sub include{open my$H,'<:raw',$_[0];read($H,my$E,999999)or die"Error: Can't include \"$_[0]\"";close$H;eval$E;}

      Can you tell us how your code improves over do and require?

      Also, please note the limitations of your code, like that it doesn't handle files larger than a megabyte.

        Oh, yes, there seems to be no difference between include() and require. I haven't thought of that! :/
Re: Splitting program into modules
by Anonymous Monk on Nov 12, 2018 at 11:53 UTC
    what am i missing here? How do properly split this code into logical chunks of separate files, but keeping namespace "main"?

    There's nothing particularly "proper" about splitting the code into separate files. You keep things in main by writing subroutines, not by fragmenting the codebase and then struggling to unify it. One program should be one file, unless the "parts" are truly going to be reused by other programs (which they usually are not).

      One program should be one file, unless the "parts" are truly going to be reused by other programs (which they usually are not)

      I hope you're not recommending 14,000 lines of main program in a single file! On the contrary, I recommend keeping the main program file short, with most of the work done in (highly cohesive, loosely coupled) modules -- with documentation and a test suite around each module.

      You can find many examples of this approach on the CPAN. For example, in Perl::Tidy and Perl::Critic, the perltidy and perlcritic main programs are not much more than one-liners, essentially just:

      use Perl::Tidy; Perl::Tidy::perltidy();
      and:
      use Perl::Critic::Command qw< run >; run();
      with all the work being done in (well-documented) modules with test suites around each module.

        I hope you're not recommending 14,000 lines of main program in a single file!

        I prefer writing, and hacking on, single file programs. It's much easier than remembering which module contains what code that's performing some action from a distance. I like to keep as much code as possible in the main program file. That being said, I also use plenty of modules, impose sane order on the source to ease navigation, and document everything.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1225546]
Approved by Discipulus
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2024-04-18 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found