in reply to Module design for loadable external modules containing data

G'day Discipulus,

The title of your post struck a chord: I went through a similar design dilemma a couple of years ago. Although adding the data to the module code seemed like it would work, mixing (mostly static) code with (potentially volatile) data went against the grain.

The solution I came up with used File::ShareDir to access the data files at runtime. Its companion module, File::ShareDir::Install, was used for installation.

File::ShareDir::Install is used in Makefile.PL. It is not used or required in any of your runtime modules. The documentation has details and it's fairly straightforward.

Here's a very rough, ASCII-art, UML representation of how I used File::ShareDir:

        +-------------+  [compose] +----------------+
        | ParentClass |<>----------| File::ShareDir |
        +-------------+            +----------------+
          A         A [inherit]
          |         |
  +-----------+  +-----------+
  | SubClass1 |  | SubClassN |
  +-----------+  +-----------+
      ^           ^         ^ [manifest]
      |           |         | 
+----------+ +----------+ +----------+
| <<file>> | | <<file>> | | <<file>> |
+----------+ +----------+ +----------+

The File::ShareDir - DESCRIPTION discusses some of the bad ways to tackle the design; e.g. using giant data structures or adding data after __DATA__. It then has: "The problem to solve is really quite simple. ...".

I hope that's enough to get you started. I have to get back to $work :-(

— Ken

Replies are listed 'Best First'.
Re^2: Module design for loadable external modules containing data
by Discipulus (Canon) on Oct 15, 2020 at 07:29 UTC
    Hello kcott and thanks,

    maybe I'm too much confused, but the interesting module File::ShareDir you suggested to me does not solve my doubts. If I read it correctly (and just glanced it atm) it is used to share and expose pure data stored at install time somewhere in the filesystem and make the program to be able to find them at runtime. My situation is more complex, or I'm too blind to see a clean solution.

    Maybe a rough piece of pseudocode will show it better:

    # # modules of my distribution: # Perl::Teacher load_course (then in the program using this module I can call $course +->get_description) load_lesson (then in the program using this module I can call $lesson +->get_steps) Perl::Teacher::Course new creates a slot for description and one for [lessons] get_description get_lessons Perl::Teacher::Lesson new creates a slot for abstract and one for [steps] get_abstract get_steps # # modules outside my distribution, intended to be written by someone e +lse and installed separately # (note that these modules should have their own tests under /t folder + as any module should) # Perl::Teacher::Course::EN::PerlIntro isa Perl::Teacher::Course $__PACKAGE__::description = 'a fair intro to perl', $__PACKAGE__::lessons = [Perl::Teacher::Course::EN::PerlIntro::01_for +eword , ...] Perl::Teacher::Course::EN::PerlIntro::01_strictures isa Perl::Teacher::Lesson $__PACKAGE__::abstract = 'introducing the safety net' $__PACKAGE__::steps = [ # a simple text 001=>{type=>text,content=>'We now introduce strictures'}, # check_script_compiles is a method exposed by Perl::Teacher 002=>{ name=>'script compiles', action=>\check_script_compiles() } # a complex test to be run against student's code 003=>{type=>test,content=>{ name => 'strictures', # select_child_of is a method exposed by Perl::Teacher select_child_of => { class => 'PPI::Statement::Include', tests => [ ['PPI::Token::Word', qr/^use$/], ['PPI::Token::Word', 'strict'] ], }, hint => "search perlintro for safety net", docs => ['https://perldoc.perl.org/perlintro.html#Safet +y-net'], } } ]

    As you can see it is very ugly. I'd like to provide a nice and usable interface to course authors.

    One solution I have is to abstract as much as possible the lesson content and provide an higher level language to define a lesson, probably YAML will be enough. I dont like very much this solution because it will force the author to write YAML instead of perl code. If I go for YAML then File::ShareDir can be an option: the Perl::Teacher::Course::EN::PerlIntro simply write 01_strictures.yaml to the disk then the Perl::Teacher object load the course and all its yaml from the disk.

    Thanks for the help!

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      You are correct in your assessment of File::ShareDir: it is predominantly for access to static data.

      Your modules should still contain program logic as usual; tests should still be written as usual. All of your static data (e.g. "discourses, assignements, questions") can be handled by File::ShareDir.

      I envisioned some sort of get_text() method implemented at the ParentClass level (from my rough UML example). So, your 001 line would become something like:

      001 => { type => 'text', content => 'strict_intro.txt' }

      Blocks of embedded text can be removed from the module code and replaced with filenames. These can be accessed along the lines of: get_text($hash{001}{content}).

      Now, your text can be available to multiple modules. Changes to that text — whether a simple typo fix, addition of new paragraphs, or a complete rewrite — need only be done in one place (i.e. strict_intro.txt). No code changes are required in any modules. This eliminates a very real source of potential errors: introducing new typos in one or more places; forgetting to modify one or more modules; breaking source code with some newly introduced punctuation character in the text; and so on.

      In addition, this will reduce the size of modules and therefore the memory footprint when they are loaded. This can be further improved upon with just-in-time mechanisms; for instance, only accessing strict_intro.txt when the student clicks on "Show the strictures intro" (or otherwise requests that information).

      "As you can see it is very ugly."

      Removing hard-coded text would certainly be a step in the right direction.

      Looking at your pseudocode, another improvement presents itself. You have your steps implemented as an array; the contents of that array is really a hash; the keys of that hash are a sequence of numbers which could easily be derived from the array index. That would look a lot cleaner as:

      [ { ... simple text ... }, { ... check script ... }, { ... complex text ... }, ]

      While I do appreciated that what you presented was pseudocode, this would be the type of thing to look for on the path to beautification.

      — Ken

      This is just my two cents and very opinionated but ...

      One solution I have is to abstract as much as possible the lesson content and provide an higher level language to define a lesson, probably YAML will be enough. I dont like very much this solution because it will force the author to write YAML instead of perl code.

      You're just getting started here. You can't even just tell them to write YAML because you're really creating a Domain Specific Language. It still won't be all of YAML, it will be a specific subset with expressions that mean something to Perl::Teacher but not YAML. (etc etc, I'm not trying to prove the point, just hoping you can see it)

      The other point I'd like to raise is that, despite OOP, in practice, we (programmers) almost always segregate data and code. IMHO, Perl::Teacher should take a cue from things like Text::CSV and XML::LibXML and process data rather than contain it. The code can live on CPAN and the data on github or whatever and just get pulled via https or whatever.

      I hope my post isn't discouraging. I think it's great that you're experimenting with an interesting blend of semantics and code.