Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

A good way to input data into a script w/o an SQL database

by ObiPanda (Acolyte)
on Sep 10, 2023 at 00:58 UTC ( [id://11154352]=perlquestion: print w/replies, xml ) Need Help??

ObiPanda has asked for the wisdom of the Perl Monks concerning the following question:

I have a working script and thus- the following is completely unnecessary. For the most part, I'm just wanting to "improve" the script by separating the data from the core script, and I'm asking for suggested good/best practices.

The data entries are each elements in an array of hashes. Here is an example of some the hash keys.

{ Sub_Name => "", Archive_File => "", Lib_Sub_Path => "", }

Is it best practices to leave it in the script or to save it to one data file or a list of data files? For my specific situation, there is no reason to set up an SQL server and database. I just want to use a simple file or files in a directory which contain the information I need to run the script with a little bit of abstraction.

Another question: is there a good CPAN module to use to input it into an array of hashing as already implemented in the script, I have no idea what to look for.

Replies are listed 'Best First'.
Re: A good way to input data into a script w/o an SQL database
by GrandFather (Saint) on Sep 10, 2023 at 03:37 UTC

    As others have suggested, your requirements are somewhat vague. However, if you mean "I have configuration data baked into my code. Is there a better way?", then you may find YAML of use:

    use YAML; ... my $options = YAML::LoadFile($fileName); ... YAML::DumpFile($fileName, $options);

    An options file might look like:

    --- alarm: AlarmHorn.wav courts: - avail: 1 name: Court 1 players: 4 size: 4 - avail: 1 name: Court 2 players: 4 size: 4 - avail: 1 name: Court 3 players: 4 size: 4 fontSize: 12 geom: 630x470 time: 720

    There are of course many other similar modules, in particular flavour of the decade seems to be JSON.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: A good way to input data into a script w/o an SQL database
by kcott (Archbishop) on Sep 10, 2023 at 02:00 UTC

    G'day ObiPanda,

    As a general rule of thumb, code will change infrequently but data may change often, so keep the two separate.

    Your question is extremely vague and it's difficult to provide a more concrete answer. If you were to present even a simple example of your current "code with data" script, I'm sure we could give a much better answer. As it stands, any response will be pure guesswork.

    — Ken

      I was just thinking of a way to input the data into an array of hashes from one or more external files which would hold the data. The array is then used to provide some simple configuration options to a program. This script is intended to run both Linux and Windows, obviously changing the paths.

      #!/usr/bin/env perl use strict; use warnings; use autodie; # For Windows my $Subscriptions_Path = "G:/Subscriptions"; my $Phone_Sync = "G:/Sync/PHONE/Main/Music"; my $Temp_Files_Location = "G:/Subscriptions/tmp"; my $Wait_Time = 10; # Subscription DATA sets my @Subscription = ( { Sub_Name => "Morph", Archive_File => "Morph Archive.txt", Lib_Sub_Path => "$Subscriptions_Path/Morph", Phone_transfer => 0 }, ); for (@Subscription) { print "Beginning Subscription Service for $_->{Sub_Name} \n"; print "\nCompleting Subscription Service for $_->{Sub_Name} \n\n\ +n"; sleep int(rand($Wait_Time)); }

        Thanks for posting your sample code. Much clearer now!

        In fact, it spookily reminded me of a node I wrote a while back: Data-driven Programming: fun with Perl, JSON, YAML, XML...

        As you can see from that node, I faced a similar problem to what you are asking about.

        Generally, I'm a fan of defining a table of properties, as you have done, because it helps to separate the code from the data. After asking my question, I ended up leaving the script alone with its table of properties hard-wired in the script. It was very flexible that way and proved to be easy to maintain over many years. Having the build script itself under version control was essential, of course, to allow us to examine changes to our automated builds over time.

        Update: see also: Data Structure References

Re: A good way to input data into a script w/o an SQL database
by LanX (Saint) on Sep 10, 2023 at 03:13 UTC
    It depends what you want to do with the data. And how many entries your AoH has.
    • Are you just processing it sequentially?
    • Are you querying/filtering it?
    • Do you need to keep it all in memory?
    Do you have ...
    • Time constraints?
    • Memory constraints?
    • Disk constraints?

    For instance in the easiest case, a trivial solution is to keep it in a text table (CSV) with with 3 columns.

    Like that you can read it line by line and have your data in a very efficient way.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

Re: A good way to input data into a script w/o an SQL database
by bliako (Monsignor) on Sep 12, 2023 at 09:04 UTC

    I often find myself in the same situation, needing to read a script's configuration from a file. This is my expecrience:

    Firstly, I decided to separate configuration data from code (edit: rephrased that for clarity) which are read at start time with a CLI option (e.g. --config myapp.conf) to the script or passed on as arguments to subroutines/constructor. In the latter case I am flexible: either pass a configuration filename or a configuration Perl data structure which was created earlier by reading the configuration file (as a means of caching the configuration -- static I assume -- data).

    There are many choices for the configuration data file format. As a rule (mine) I avoid storing the configuration as Perl data structure and reading and eval()'ing that code. Because it is a wide-open door for your script to execute unknown/injected user-specified code pretending to be data. (I know that you said your script is only for you and the data is static, living somewhere in the distribution's homedir.)

    Re: Storable: it is an interesting alternative to directly eval()'ing Perl data structures from separate data/configuration files (which can become easily arbitrary Perl code!). Unfortunately it comes with a security warning about loading untrusted Storable-based data even with default settings. And they know better than me.

    At this point, I should mention that you can invent your own format. But since what you want is pretty standard, then what's the point? Additionally, I often have unicode content in my configuration files and this is correctly handled by the modules I am mentioning here (and tested by them, ouch!). And that's a vote against writing your own.

    So, my quest ended with a choice of YAML, JSON, or the so-called "windows INI" config files (you know the [Section1] thingy) - surely there must be others, excuse my oversight. INI can be read/wriiten with Config::Tiny but M$ may decide to put a copyright on the file format in the future - who knows? And direct you to an online .NET service, hardwired with ChatGPT data collection and captchas, perhaps biometric, just for reading your files.

    So, for me, the options narrowed down to YAML or JSON.

    My choice is JSON. Mainly because I try to avoid any programming tool which uses and counts spaces as part of the code. I find space-counting (space is the only invisible ASCII character > 31) irritating. I detest these products, personally, as I was never fan of the de Sade inflection (edit: neither von Masoch's). Memo-to-self: create a format which utilises the audible bell instead of space. Hey! why not backspace?

    And so JSON then. This can be read/written easily with JSON/JSON::XS.

    JSON has disadvantages for readability: no multi-line strings and no comments are allowed. And double quotes must be escaped. So readbility is bad, especially for long strings as is my case (multiline bash scripts). Manual editing can be tedious for long strings. Additionally, on parsing errors, JSON/JSON::XS print the location as the number of characters from start and print just a tiny bit of the faulty section which makes it very difficult for me to pinpoint the error (just a few characters which invariably end up only spaces, tabs and newlines). So, huge frustration for me there.

    That said, and to be fair, YAML supports both comments and multi-line strngs. But, alas, it has the dreaded space as king! (naked!)

    Shamleless Plug: As I said, I do heavy use of configuration files. I started with plain JSON. But because I wanted to allow for comments, multiline strings/verbatim/heredoc sections and template-style variables. I eventually called all the above enhancements "Enhanced JSON" (adhoc term) and whipped up a module to read and write these files with the existing JSON doing all the heavy lifting. The module is Config::JSON::Enhanced. But for what you presented here, plain JSON is just enough. Or YAML.

    bw, bliako

        Apropos make's idiosyngracy, I stand *nix-biased because I respected that rule since the beginning. I never complained about it! And to be frank, I have rarely been bitten by it and the remedy was painless. I guess, for me, idiosyngracies like this can be tolerated if the coding is super complicated (the logic or the language) and counting spaces allows you some time to subconsciously contemplate your code. In fact, it can be an advantage. Here is a field for research.

        Reading your linked On Interfaces and APIs I realised I had forgotten to mention XML. All is well. I can safely leave out that bureaucratic invention and admit it in the hall of fame as a rare example of the format being lengthier than the content.

        bw, bliako

        Some tools I suspect you'd try to avoid...Python...

        I would...
        But perhaps not for the reason you suggested!

      I find space-counting (space is the only invisible ASCII character > 31) irritating.

      I hate to be the bearer of bad tidings, but should you start working with other languages, you'll soon discover that there are many invisible characters and they can really put a monkeywrench in the works at times. One of the most well-known among these is the zero-width space (ZWS) character. It's an invisible character that does not even occupy a space on the screen! Meanwhile, said character affects pattern matching (the word is no longer spelled the same if it has a ZWS in the middle of it), it affects such things as word-wrapping, especially common in unspaced languages like Thai, Lao, Karen, Burmese, etc., and can generally be a nuisance if unexpected and/or the programmer is too naive to account for its existence.

      And it's rumored (probably quite true) that various three-letter governmental organizations use the zero-width space in English texts, such as on their websites, placed randomly throughout the text in a fingerprint fashion to track people. Text exactly matching what their server served on a particular occasion can be linked back to the one thus served by this invisible fingerprint. So, it may be worthwhile to purge texts of this sly character.

      And there are others: thin spaces, hair spaces, etc.--over 20 of these invisible characters in the unicode spectrum.



      I definitely recommend YAML. It is a strict superset of JSON, FWIW. It is ideal for data driven applications especially those that might currently be tightly coupled with the program logic. I've been using it with increasing frequency. YAMLScript is also a thing.
        [...] YAML. It is a strict superset of JSON, FWIW.

        No: Re^2: conf file in Perl syntax


        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: A good way to input data into a script w/o an SQL database
by eyepopslikeamosquito (Archbishop) on Sep 10, 2023 at 06:27 UTC

    As others have noted, we can't provide good answers to your vague questions until you present us with something more concrete. You've already explained that you are not a programmer, which is good because it helps us provide answers at the appropriate level.

    To help us help you, we'd like to see your specific problem clarified with a sample script that you've written that we can run. Hopefully, these links clarify:

    > Another question: is there a good CPAN module to use to input it into an array of hashing as already implemented in the script, I have no idea what to look for.

    As a general caution, I wouldn't go rushing to download CPAN modules just yet. Though CPAN is great, there are hidden snags. What if the CPAN module has a security vulnerability? What if the author abandons it? How quickly can you isolate/troubleshoot a bug in its code?

    So I'd recommend first trying to solve your problem using core Perl and core Perl modules. After you post that attempt here, we can advise on whether a CPAN module is appropriate.

Re: A good way to input data into a script w/o an SQL database
by karlgoethebier (Abbot) on Sep 10, 2023 at 05:53 UTC
Re: A good way to input data into a script w/o an SQL database
by Discipulus (Canon) on Sep 11, 2023 at 07:24 UTC
    Hello ObiPanda,

    you already got very good answer, but, strange to me, some option was not mentioned.

    Storable is a core module perfectly suited to load perl data structures from file. I have two command saved to read between Storable and YAML:

    sto2yaml=perl -MYAML -MStorable -e "print Dump @{retrieve ($ARGV[0])}; +" $* yaml2sto=perl -e "use YAML (LoadFile); use Storable qw(nstore); @ar = +LoadFile($ARGV[0]); nstore(\@ar, $ARGV[1])" $*

    Also Data::Dumper is a core module able to store and retrieve perl datastructures.

    Then there is also Sereal a CPAN module offering great performances.

    If you are only interested in something like an external configuration you can find fun my Modules as configuration files


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      These are great for serialising Perl data structures out to disk then reading them back in later, but not so great for hand editing some options and not so great for recording in a revision control system.

      As it turns out the OP is working with text data so something like JSON, YAML or even .inf files are likely to be a better match.

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: A good way to input data into a script w/o an SQL database
by eyepopslikeamosquito (Archbishop) on Sep 10, 2023 at 06:33 UTC

    > I have a working script

    Great. Why not post it? (or a cut-down version of it, if it's too long).

Re: A good way to input data into a script w/o an SQL database
by duelafn (Parson) on Sep 10, 2023 at 13:21 UTC

    As was mentioned, one of the many configuration file formats (YAML, JSON, TOML, ...) is usually what one would reach for first. If those are insufficient or unwieldy due to amount or structure of data, SQLite (DBD::SQLite) is a plain file serverless database.

    Good Day,

Re: A good way to input data into a script w/o an SQL database
by BillKSmith (Monsignor) on Sep 10, 2023 at 19:03 UTC
    Your intuition that the code and data should be separated is correct. Data usually falls into one of two categories which should also be kept separate. In the first, any change makes all previous version obsolete. In the other, you have several versions of the data in production at the same time. (e.g. one version for each customer, user, location, etc.) Changes to each version can be made independently, but are probably rare. The choice of technology to store each type of data is not very important. Ideally, it should be easy for the responsible person(s)? to view and/or change the data, and difficult for anyone else. In making that choice, You also must consider what resources are shared by users of the data. Are they all on one computer, one LAN, or internet? Is it possible to share a data base?

    I hope I have been a help in specifying the kind of information we need in order to help you.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11154352]
Approved by kcott
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-22 03:33 GMT
Find Nodes?
    Voting Booth?

    No recent polls found