DarrenSol has asked for the wisdom of the Perl Monks concerning the following question:

I've run across an obstacle in Perl that I didn't expect. Reading through the module tutorial, it appears that this obstacle is intended. Seems odd to me, since the Perl motto is TMTOWTDI.

The way I'd like to do it seems to me the most logical, but Perl, apparently, says I can't do it this way :( Mayhap my gray-matter processing unit is defective...

This is the problem : I have a set directory tree I'm working with, and a number scripts that process the files.

Seems to me the most straight-forward way to handle this would be to load the directory structure, or specific branches of it, into arrays.

I'd like to make these arrays, and their contents, available to a script by coding them in a module, as I've done with hard-coded variables. Is this something that Perl doesn't want me to do ?

I've considered either script to traverse the directory tree, or callable sub-routines, but these seem like overkill for a set directory tree. Either one seem to me to be "making an easy thing hard" :P

An example, with a subset of the directory tree:

The initial files are downloaded to the DownLoad folder. These are update files for data sets I've already started.

The files in the DownLoad folder are moved to the appropriate "raw" folder, retained as an archive.

The "raw" folder files, which are updates, are appended to the existing data from the files in the "processed" folders. The merged data overwrites the files in the "processed" folders.

The "processed" files are analyzed, with summary reports placed in the "analysis" folders.

Sample tree structure:

\DownLoad (top folder, initial downloaded files)

\DownLoad\Weekly\raw (downloaded Weekly files moved here)
\DownLoad\Weekly\processed (merged files)
\DownLoad\Weekly\analysis (file summary reports)

\DownLoad\Daily\raw (downloaded Daily files moved here)
\DownLoad\Daily\processed (merged files)
\DownLoad\Daily\analysis (file summary reports)

I'm writing scripts to process files in those directories. Then, for example:

foreach $RawFileName ( $DownLoadFolder )
{ distribution script, files moved to "raw" folders }

foreach $AppendFileName ( @WeeklyDirTreeArray ) { (script) }
or
foreach $AnalyzeFileName ( @DailyDirTreeArray ) { (script) }

This works, but I'm only able to do this by pasting the array declaration and initialization code in each script. Seems cumbersome and kludgy. If I change or add to the tree in the future, I'll have to propagate the changes manually to each script - more kludgy copy-and-paste.

Unlike hard-coded variables which most or all of the scripts use, I can't declare the arrays in a module and fill them - which seems to me the logical way to handle hard-coded arrays.

Writing script to traverse the directory tree would work, but seems like unnecessary overhead for set-in-place traversing routines. And the traversing code would be copied-and-pasted into each script, which still seems kludgy. Likewise if I change or add to the directory - manual editing of the traverse code in each script.

Creating a callable sub-routine in a module, which would declare and initialize the arrays, would work, but seems to be unnecessarily complicated overhead for a basic programming problem. Again, "making an easy thing hard", which just don't seem very Perl-like :)

Replies are listed 'Best First'.
Re: Fill an array in a module ?
by RonW (Parson) on Sep 13, 2014 at 21:54 UTC

    Variables in a module can be exported (see Exporter) or they can be accessed from outside the module using their fully qualified name:

    package MyModule; our @SharedArray = (1, 2, 3, 4); # shared my @PrivateArray = (5, 6, 7, 8); # not shared 1;
    use MyModule; print MyModule::SharedArray[1]; # prints 2

    In Perl, each module has its own namespace. This allows developers of modules and programs to name variables and functions without worrying about the names in programs or other modules.

      print MyModule::SharedArray[1]; # prints 2

      Should be  $MyModule::SharedArray[1] The variable name still needs a sigil for full qualification.

      In Perl, each module has its own namespace.

      Each module can be given its own namespace (as shown in the code example via package) or, indeed, any namespace. It's not automatic.

        Each module can be given its own namespace (as shown in the code example via package) or, indeed, any namespace. It's not automatic.

        :) Yup its not automatic, but a "module" has a specific meaning:) A module is a namespace (package Foo) associated with a file (Foo.pm). A module is a package you can use or require. So if it doesn't do these things we usually don't call it a module :)

        Thanks for the reminder. I hadn't been using namespace qualifiers for these modules. I'd gotten away with that, somehow. But as the script set grows, it'll likely get cumbersome to have distinct names for arrays and variables.

        When I looked over the set I'd already scripted, it was obvious that some of those names were painfully long - and a fair number already included a kind of 'namespace' module abbreviation prefix in them.

        Yes, I was still scripting Saturday night...actually, until early Sunday morning. I used to try an alarm clock to remind me I should sleep at some point, but I just kept turning it off...

      Thanks for the information. I hadn't been using 'our' or 'my' in modules, but I see from your example why I should. And the namespace qualifying as well. A bit of a task to modify the modules, but thankfully I'm not so deep into this one that it's a mountain of script to go through:)

      As the old saying goes...well, how I think it should go, in this case:
      "If it ain't broke, don't fix it. But just because it ain't broke don't mean it's working properly."

        But if it's not working properly, then it is broke
Re: Fill an array in a module ?
by einhverfr (Friar) on Sep 14, 2014 at 02:24 UTC

    First, I would generally avoid using a module-global array. This is because there are plenty of functions which modify the array and thus if you suddenly end up with paths missing you have to figure out what shifted or popped your array. That could be a lot of fun, not....

    So while it is true you can declare the array with "our" scope and access it with a fully qualified scope, this strikes me as fairly brittle because such access is read-write.

    For this reason I would certainly recommend declaring the array with a "my" scope and having a function that returns a copy of the array. Something like:

    my @paths; sub get_paths { init_paths unless @paths; return @paths; } sub init_paths { ... # logic to initialize your path list }

    Note this passes the array back by values and thus effectively gives other modules read-only access to it.

      I hadn't considered the overwrite problem. Probably because I write perfect script which never oversteps boundaries :P

      At this point, I'm going with a modified version of what you recommend. Instead of using a function to return a copy of the array, the function refills the global 'our' array with the original data.

      My thinking is that since I'll be calling a function to ensure the array I use has the original data, skip the copy and just refill the global 'our' array. None of those arrays are too large, so the file read to fill the array shouldn't impact the run time of the script too much.

      Hmmm. That last sentence brings to mind the phrase "Famous last words." I wonder why...

Re: Fill an array in a module ?
by DarrenSol (Acolyte) on Sep 13, 2014 at 18:01 UTC

    Addendum:

    I'd considered keeping the directory tree listings in files, then loading the files into arrays. Like this:

    (WeeklyDirTree.Csv)
    \DownLoad\Weekly\raw
    \DownLoad\Weekly\processed
    \DownLoad\Weekly\analysis

    I'd then declare the arrays in each script. Like this:

    splice( my @WeeklyDirTree );

    Then I'd read the files into the arrays.

    That would automate propagation of directory tree additions or changes. But I'd still be doing the copy-and-paste routine in each script for the repetitive open-and-read of the directory tree files. Which still seems cumbersome and kludgy.