Re: A good way to input data into a script w/o an SQL database

I often find myself in the same situation, needing to read a script's configuration from a file. This is my expecrience:

Firstly, I decided to separate configuration data from code (edit: rephrased that for clarity) which are read at start time with a CLI option (e.g. --config myapp.conf) to the script or passed on as arguments to subroutines/constructor. In the latter case I am flexible: either pass a configuration filename or a configuration Perl data structure which was created earlier by reading the configuration file (as a means of caching the configuration -- static I assume -- data).

There are many choices for the configuration data file format. As a rule (mine) I avoid storing the configuration as Perl data structure and reading and eval()'ing that code. Because it is a wide-open door for your script to execute unknown/injected user-specified code pretending to be data. (I know that you said your script is only for you and the data is static, living somewhere in the distribution's homedir.)

Re: Storable: it is an interesting alternative to directly eval()'ing Perl data structures from separate data/configuration files (which can become easily arbitrary Perl code!). Unfortunately it comes with a security warning about loading untrusted Storable-based data even with default settings. And they know better than me.

At this point, I should mention that you can invent your own format. But since what you want is pretty standard, then what's the point? Additionally, I often have unicode content in my configuration files and this is correctly handled by the modules I am mentioning here (and tested by them, ouch!). And that's a vote against writing your own.

So, my quest ended with a choice of YAML, JSON, or the so-called "windows INI" config files (you know the [Section1] thingy) - surely there must be others, excuse my oversight. INI can be read/wriiten with Config::Tiny but M$ may decide to put a copyright on the file format in the future - who knows? And direct you to an online .NET service, hardwired with ChatGPT data collection and captchas, perhaps biometric, just for reading your files.

So, for me, the options narrowed down to YAML or JSON.

My choice is JSON. Mainly because I try to avoid any programming tool which uses and counts spaces as part of the code. I find space-counting (space is the only invisible ASCII character > 31) irritating. I detest these products, personally, as I was never fan of the de Sade inflection (edit: neither von Masoch's). Memo-to-self: create a format which utilises the audible bell instead of space. Hey! why not backspace?

And so JSON then. This can be read/written easily with JSON/JSON::XS.

JSON has disadvantages for readability: no multi-line strings and no comments are allowed. And double quotes must be escaped. So readbility is bad, especially for long strings as is my case (multiline bash scripts). Manual editing can be tedious for long strings. Additionally, on parsing errors, JSON/JSON::XS print the location as the number of characters from start and print just a tiny bit of the faulty section which makes it very difficult for me to pinpoint the error (just a few characters which invariably end up only spaces, tabs and newlines). So, huge frustration for me there.

That said, and to be fair, YAML supports both comments and multi-line strngs. But, alas, it has the dreaded space as king! (naked!)

Shamleless Plug: As I said, I do heavy use of configuration files. I started with plain JSON. But because I wanted to allow for comments, multiline strings/verbatim/heredoc sections and template-style variables. I eventually called all the above enhancements "Enhanced JSON" (adhoc term) and whipped up a module to read and write these files with the existing JSON doing all the heavy lifting. The module is Config::JSON::Enhanced. But for what you presented here, plain JSON is just enough. Or YAML.

bw, bliako

Comment on Re: A good way to input data into a script w/o an SQL database Select or Download Code

Replies are listed 'Best First'.
Re^2: A good way to input data into a script w/o an SQL database by eyepopslikeamosquito (Archbishop) on Sep 12, 2023 at 11:06 UTC
I try to avoid any programming tool which uses and counts spaces as part of the code. I find space-counting (space is the only invisible ASCII character > 31) irritating. I detest these products, personally, as I was never fan of the de Sade inflection. Memo-to-self: create a format which utilises the audible bell instead of space. Hey! why not backspace? Thanks for making me laugh. :) Some tools I suspect you'd try to avoid: Python - for its significant indentation make - earned a dishonourable mention at On Interfaces and APIs for decreeing the actions of a rule must start with a tab Whitespace (programming language) Acme::Bleach - appeared two years before the Whitespace programming language ... so I think TheDamian was the first! Other examples welcome. See also: Syntactically Significant Whitespace Considered Harmful (c2.com)	[reply]
Re^3: A good way to input data into a script w/o an SQL database by bliako (Abbot) on Sep 12, 2023 at 14:28 UTC
Apropos `make`'s idiosyngracy, I stand *nix-biased because I respected that rule since the beginning. I never complained about it! And to be frank, I have rarely been bitten by it and the remedy was painless. I guess, for me, idiosyngracies like this can be tolerated if the coding is super complicated (the logic or the language) and counting spaces allows you some time to subconsciously contemplate your code. In fact, it can be an advantage. Here is a field for research. Reading your linked On Interfaces and APIs I realised I had forgotten to mention XML. All is well. I can safely leave out that bureaucratic invention and admit it in the hall of fame as a rare example of the format being lengthier than the content. bw, bliako	[reply] [d/l]
Re^3: A good way to input data into a script w/o an SQL database by Bod (Parson) on Sep 12, 2023 at 21:59 UTC
Some tools I suspect you'd try to avoid...Python... I would... But perhaps not for the reason you suggested!	[reply]
Re^2: A good way to input data into a script w/o an SQL database by perlfan (Parson) on Sep 15, 2023 at 14:21 UTC
I definitely recommend YAML. It is a strict superset of JSON, FWIW. It is ideal for data driven applications especially those that might currently be tightly coupled with the program logic. I've been using it with increasing frequency. YAMLScript is also a thing.	[reply]
Re^3: A good way to input data into a script w/o an SQL database by afoken (Chancellor) on Sep 15, 2023 at 14:49 UTC
[...] YAML. It is a strict superset of JSON, FWIW. No: Re^2: conf file in Perl syntax Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re^2: A good way to input data into a script w/o an SQL database by Polyglot (Chaplain) on Sep 24, 2023 at 02:13 UTC
I find space-counting (space is the only invisible ASCII character > 31) irritating. I hate to be the bearer of bad tidings, but should you start working with other languages, you'll soon discover that there are many invisible characters and they can really put a monkeywrench in the works at times. One of the most well-known among these is the zero-width space (ZWS) character. It's an invisible character that does not even occupy a space on the screen! Meanwhile, said character affects pattern matching (the word is no longer spelled the same if it has a ZWS in the middle of it), it affects such things as word-wrapping, especially common in unspaced languages like Thai, Lao, Karen, Burmese, etc., and can generally be a nuisance if unexpected and/or the programmer is too naive to account for its existence. And it's rumored (probably quite true) that various three-letter governmental organizations use the zero-width space in English texts, such as on their websites, placed randomly throughout the text in a fingerprint fashion to track people. Text exactly matching what their server served on a particular occasion can be linked back to the one thus served by this invisible fingerprint. So, it may be worthwhile to purge texts of this sly character. And there are others: thin spaces, hair spaces, etc.--over 20 of these invisible characters in the unicode spectrum. Blessings, ~Polyglot~	[reply]