Persistent data structures -- why so complicated?

davebaker has asked for the wisdom of the Perl Monks concerning the following question:

More of a meditation here than a question, but perhaps it will generate some insight for me and others.

Think back on when you were just getting started with programming or computer science, and how you might have naively expected that the results of a program could be used in a later invocation of the same program. Maybe it's just a hits counter, but more often for me the need has been to store a record -- let's say it's a student's name, place of birth, and list of hobbies. So there ya have a Perl structure. All the teacher (who learned Pascal back in the '90s and thinks in terms of "records" of data) wants to do is store the information on a server or other computer so that he or she can call up the students' records at some point in the future -- maybe to browse all the information, or maybe to read or edit a particular student's information, or maybe to delete the information of a student who drops the class. One day the teacher might want to generate a report that lists all the students who were born in Florida and also like to run as a hobby, or otherwise get crazy with some other munging of the information.

So far, only two students are enrolled:

{
   name          => 'Bob Kowalski',
   home_town     => 'Vero Beach',
   home_state    => 'Florida',
   hobbies       => [ 'ham radio', 'Perl programming', 'running' ],
}
[download]

And:

{
   name          => 'Kranessa Evans',
   home_town     => 'Dallas',
   home_state    => 'Texas',
   hobbies       => [ 'Perl programming', 'writing', 'polo' ],
}
[download]

So, how to store the information about those students? It looks like an array of hashes to me, which sounds complicated to a non-Perl programmer, but hey, the teacher has read the Camel book.

In another world, the task could be this simple, albeit using some by-hand code here for the initial set of students' information:

#!/opt/perl

use strict;
use warnings;

my @students = ( 
   {
     name          => 'Bob Kowalski',
     home_town     => 'Vero Beach',
     home_state    => 'Florida',
     hobbies       => [ 'ham radio', 'Perl programming', 'running' ],
  },
  {
     name          => 'Kranessa Evans',
     home_town     => 'Dallas',
     home_state    => 'Texas',
     hobbies       => [ 'Perl programming', 'writing', 'polo' ],
  },
);

store_to_file( '/data/students.db', \@students );  # This make-believe
+ easy-peasy function 
                                                   # would be built in
+to Perl.
[download]

I can live with the complexity of needing to tell my program where to store the data in my fantasy -- i.e., a file name -- and with needing to know that it's probably better to use a "reference thingie" to the list of information when passing the information into store_to_file() rather than passing the data in the list outright (again, the hypothetical teacher who wants to use Perl has read the Camel book).

But, because there is no such built-in store_to_file() function for my students' records, I need figure out how to use the MLDBM module (which has been my go-to solution, and I wrote a Perl.com article about it back in 2006). Or perhaps I need to add something like these many lines to my program (taken from Recipe 11.13 in the Perl Cookbook, 2d edition):

use Storable qw( nstore_fd retrieve_fd );

sub store_to_file {
    my ( $db, $data_ref ) = @_;

    sysopen( DF, $db, O_RDWR | O_CREAT, 0606 )
        or die "Can't open '$db', stopped: $!";
    flock( DF, LOCK_EX )
        or die
        "Can't get exclusive lock on '$db' for writing, stopped: $!";
    nstore_fd( $data_ref, *DF )
        or die "Can't store data: $@";
    truncate( DF, tell(DF) );
    close(DF);
    return 1;
}

sub retrieve_from_file {      # Certainly going to need this complemen
+tary function in the future,
                              # to read my students' information.
    my $db = shift;

    unless ( -e $db ) {
        store_to_file( [] );  # Initialize upon first-ever usage of th
+is function if no student 
                              # information has been stored before, so
+ it won't crash in that instance
                              # because the data file doesn't exist ye
+t. Maybe there's an easier,
                              # softer way to instantiate the data fil
+e in this admittedly unusual
                              # scenario (which I managed to encounter
+ last night).
    }

    open( DF, "< $db" )
        or die "Can't open '$db' for reading, stopped: $!";
    flock( DF, LOCK_SH )
        or die
        "Can't get shared lock on '$db' for reading, stopped: $!";
    my $data_ref = retrieve_fd(*DF);
    close(DF);
    return $data_ref;
}
[download]

Another solution for data storage is to use a database such as sqlite or Postgres, which probably is the most robust solution because, well, databases are optimized for data, avoiding the need for file locking in particular. But my hypothetical teacher would need to learn a good bit of SQL, figure out how to create relational tables especially to accommodate that pesky "field" for multiple hobbies, and would need to get an SQL server up and running (although the file-based sqlite technique could be used). "Gosh, all I want to do is store my students' information using a program I'd like to write."

Another solution -- doh -- would be to buy the teacher a copy of Excel or an equivalent. But the teacher is obsessed with the idea of writing a custom program. Or, heck, use Notepad or even get a typewriter and some paper out of the closet. But assume the teacher is set on storing the information on a computer in a way that can be later used in programs-yet-to-be-written.

The store_to_file() and retrieve_from_file() functions certainly could be abstracted into a module called Students.pm that would enable them to be exported so that the teacher's program could be written to "use Students qw( store_to_file retrieve_from_file );" and then store_to_file() and retrieve_from_file() could be used in the program as needed -- without needing to remember why it's better for Storable to use the more-portable nstore than store (because the program might be moved to a new server some day that has an operating system with different C functions used by Storable such that the data can't be retrieved on the new server -- been there, done that, although I might be thinking of a different module), or how to configure file locking for the data (because another teacher might be reading my data file at the same time that I'm writing to it).

But -- seriously? All of this, just to reliably store a couple of students' records so that I can read them later, assuming I want to write a Perl program to do it?

I know that using this much code for a file-based solution for record-type ("complex" and yet not really all that complex) data structures is an effective way to do it, and I enjoy doing it because I enjoy Perl, but last night as I cobbled a program together that just needed to store a data structure that's not a simple key => "scalar" sort of record, I had the thought -- "Can it really be this hard? Couldn't there be a store_to_file() function built into the language, in order to provide an easy way to store some record-type data for later use?"

I wonder if that question might spur some insights that I haven't seen.

Comment on Persistent data structures -- why so complicated? Select or Download Code

Replies are listed 'Best First'.
Re: Persistent data structures -- why so complicated? by Discipulus (Canon) on Mar 11, 2021 at 18:13 UTC
Hello davebaker, probably I missed the point but if you are programming in perl I do not see the complexity using Storable. Anyway first of all take a look to a recent thread: Banal Configuration Languages That said you can have a lot of options beside Storable. The basic Data::Dumper dumps structures that can be `eval` -ed to have the datastructure back. If you like more human readable solutions there is YAML or the very common outside perl world JSON I used something like `perl -MYAML -MStorable -e "print Dump @{retrieve ($ARGV[0])};"` and `perl -e "use YAML (LoadFile); use Storable qw(nstore); @ar = LoadFile($ARGV[0]); nstore(\@ar, $ARGV[1])"` to translate `YAML` and `Storable` formats. A plethora of `Config::` modules are also available. You can have comma separated data, in external files but also after the `__DATA__` token.. So.. many many options for every taste. HTH L There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2:Persistent data structures -- why so complicated? by davebaker (Pilgrim) on Mar 11, 2021 at 22:33 UTC
Thanks! I think I got freaked out by the Perl Cookbook's lengthy example as to how to use file locking with Storable, but now that I've looked at the outstanding documentation for the module I see there's even a lock_store function (and lock_nstore for a more portable though slightly slower solution) that apparently would replace those details (and lock_retrieve). Sometimes the Cookbook (2003) tells me more than I need to know, but Time Marches On and I forget that the state of the art is different than when the second edition of the Cookbook was published 18 years ago. It's still the book I typically reach for first. Would a comma-separated data format work in my example? Each student "record" has a key that holds multiple values (the student's hobbies), so I'm thinking there isn't a key-to-scalar-value relationship that's required for a CSV format.	[reply]
Re^3: Persistent data structures -- why so complicated? by Tux (Canon) on Mar 12, 2021 at 08:34 UTC
Of course `CSV` would suffice for the example you gave, but it will be a lot more complicated if you have nested structures. You can combine all the given examples, e.g. use the Database example by using DBD::CSV or next level, tie that CSV database handle with Tie::Hash::DBD and use your hash without changing the rest of your program. One advice: don't underestimate how "simple" CSV is. do use a module, like Text::CSV_XS and/or Text::CSV. For the hobbies, you would either require a serialization like Sereal or Storable. If hobbies is the only field with multiple values, just drop them at the end of the record, so you can restore them lik `use 5.14.2; use warnings; use Text::CSV_XS "csv"; use Data::Peek; my $data = csv (in => *DATA); my @hdr = @{shift @$data}; my $n = $#hdr; my @students = map { my %h; @h{@hdr} = splice @$_, 0 , $n; $h{hobbies} = [ @$_ ]; \%h, } @$data; DDumper \@students; __END__ name,home_town,age,hobbies Angie Green,Denver,12,Horses,Painting,Karaoke,Ham Radio Jamie Brown,London,34 Mark White,Madrid,56,Cooking,Perl,Running Mary Black,Dublin,30,Baking,Singing,Swimming,Programming Perl` [download] --> `[ { age => '12', hobbies => [ 'Horses', 'Painting', 'Karaoke', 'Ham Radio' ], home_town => 'Denver', name => 'Angie Green' }, { age => '34', hobbies => [], home_town => 'London', name => 'Jamie Brown' }, { age => '56', hobbies => [ 'Cooking', 'Perl', 'Running' ], home_town => 'Madrid', name => 'Mark White' }, { age => '30', hobbies => [ 'Baking', 'Singing', 'Swimming', 'Programming Perl' ], home_town => 'Dublin', name => 'Mary Black' } ]` [download] use Text::CSV_XS "csv"; Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re:Persistent data structures -- why so complicated? by Tux (Canon) on Mar 11, 2021 at 17:46 UTC
The suggested Storable is core and should work fine, but there are many more - often much nicer - ways to store persistent hashes. IMHO a more modern and better way is Sereal. Not that for some, complex keys of code refs in values may cause havoc. Personally I like to tie my hashes if I need persistence. With my Tie::Hash::DBD you can even tie your hashes to be stored in a database. See this script (which you should modify to your needs) to benchmark the available methods. this page shows benchmarks for the methods I found on my system (higher values in the last column are better). Enjoy, Have FUN! H.Merijn	[reply]
Re: Persistent data structures -- why so complicated? by tybalt89 (Monsignor) on Mar 11, 2021 at 16:21 UTC
`NAME Storable - persistence for Perl data structures SYNOPSIS use Storable; store \%table, 'file'; $hashref = retrieve('file');` [download]	[reply] [d/l]
Re^2: Persistent data structures -- why so complicated? by LanX (Saint) on Mar 11, 2021 at 16:24 UTC
and perldoc `Storable` cpan Storable corelist Storable `corelist Storable Data for 2021-01-23 Storable was first released with perl v5.7.3` [download] tho I'm not sure I understood the OP .. probably a case of TL;DR Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re: Persistent data structures -- why so complicated? (updated) by haukex (Archbishop) on Mar 11, 2021 at 18:53 UTC
The thread "Is there any cache mechanism for running perl script" might be an interesting read, where the replies show how to store data structures using Storable, JSON, or Path::Class (the latter only for simple arrays). As for editing a file "in-place", I showed several variations of that in this node. Note that, with modern databases supporting JSON, and with modern Perl modules, databases are IMHO pretty nice to work with too. For example, given a Postgres table: `CREATE TABLE students ( name TEXT NOT NULL PRIMARY KEY, hometown TEXT NOT NULL, grade TEXT, data JSONB NOT NULL DEFAULT '{}'::jsonb );` [download] Here's some code using Mojo::Pg showing `INSERT`, `SELECT`, and `UPDATE`: use Mojo::Pg; my $pg = Mojo::Pg->new('postgres://localhost:54321/testing') ->password('BarFoo'); $pg->db->insert('students', { name => 'Bob Kowalski', hometown => 'Vero Beach, FL', data => { -json => { hobbies => [ 'ham radio', 'Python programming', 'running' ], } }, }); $pg->db->insert('students', { name => 'Kranessa Evans', hometown => 'Dallas, TX', data => { -json => { hobbies => [ 'Perl programming', 'writing', 'polo' ], } }, }); my $res = $pg->db->select('students')->expand; while ( my $rec = $res->hash ) { if ( grep {/perl/i} @{ $rec->{data}{hobbies} } ) { $pg->db->update('students', { grade=>'A' }, { name=>$rec->{name} } ); } } $res->finish; [download] For a database that doesn't even require a server, Mojo::SQLite has ~~pretty much exactly the same~~ a very similar API as the above (Edit:* though I haven't used JSON in SQLite yet). Update 2: I've modified the connection string in the above to be more useful than using the `postgres` superuser - and normally one wouldn't hardcode the password of course, see e.g. `~/.pgpass`. The following is how I spun up the test database. I'm using port 54321 instead of the default 5432. $ docker run --rm -p54321:5432 --name pgtestdb -e POSTGRES_PASSWORD=Fo +oBar -d postgres:13 # wait a few seconds for it to start $ echo "CREATE USER $USER PASSWORD 'BarFoo'; CREATE DATABASE testing; +GRANT ALL PRIVILEGES ON DATABASE testing TO $USER;" \| psql postgresql +://postgres:FooBar@localhost:54321 $ psql postgres://localhost:54321/testing # log in and create the above table # run the above Perl script $ PGPASSWORD=BarFoo psql postgres://localhost:54321/testing -c 'SELECT + * FROM students' $ docker stop pgtestdb [download] * Update 3: One more thought here: I'm definitely not advocating just dumping everything in an unstructured JSON blob - one should still try to follow the rules of good database design and normalization as much as possible. But sometimes there are cases where nested data structures can be an advantage, in which case having support in the database for them can be very useful. Update 4: In that regard, erix's reply below!	[reply] [d/l] [select]
Re^2: Persistent data structures -- why so complicated? by dsheroh (Monsignor) on Mar 12, 2021 at 07:58 UTC
For a database that doesn't even require a server, Mojo::SQLite has pretty much exactly the same a very similar API as the above Or standard DBI and DBD::SQLite if you don't feel the need to drag a random web application framework into your code.	[reply]
Re^3: Persistent data structures -- why so complicated? by haukex (Archbishop) on Mar 12, 2021 at 08:30 UTC
Or standard DBI and DBD::SQLite if you don't feel the need to drag a random web application framework into your code. On the one hand, I understand the sentiment, as loading it does add overhead (although perhaps you should have said so more clearly instead of just expressing your apparent distaste for it), on the other, I think Mojo does have its advantages for simplifying writing code - IMHO it's extremely Perlish. I did list the Mojo solution last (Edit: did you look at the threads I linked to?), since it has the largest learning curve, but I thought it was worth mentioning in the spirit of TIMTOWTDI. Since the OP does talk about "All the teacher ... wants to do is store the information on a server or other computer so that he or she can call up the students' records at some point in the future -- maybe to browse all the information, or maybe to read or edit a particular student's information ...", using a web interface as a solution is not unthinkable. There's no hint of needing to process billions of records or accesses or other hints that the aforementioned performance overhead is a concern - though it's certainly worth keeping in mind.	[reply]
Re^4: Persistent data structures -- why so complicated? by dsheroh (Monsignor) on Mar 13, 2021 at 12:34 UTC
Re^5: Persistent data structures -- why so complicated? by haukex (Archbishop) on Mar 13, 2021 at 15:42 UTC
Re^2: Persistent data structures -- why so complicated? (updated) by erix (Prior) on Mar 13, 2021 at 15:02 UTC
For PostgreSQL jsonb one might mention the advantage of json(b)-indexing: it makes access of large json tables fast (rule of thumb: 100x faster - of course, it only matters for large datasets). (see the PostgreSQL fine manual on JSON-indexing)	[reply]
Re: Persistent data structures -- why so complicated? by 1nickt (Canon) on Mar 11, 2021 at 15:52 UTC
Hi, you might like Redis and https://redis.io for this type of thing. Hope this helps! The way forward always starts with a minimal test.	[reply]
Re: Persistent data structures -- why so complicated? by bliako (Abbot) on Mar 12, 2021 at 11:20 UTC
Storable or Sereal was the 1st thing that came to my mind but then I realised that they do not allow for searching individual fields (e.g. `select`) while in-store. You have to read them back into memory and do that there. Additionally, when you want to insert a new record with references or referred by already existing record you need to read all back in memory, add the record, and store. So, (as Choroba said on CB:) "it's hard to make guesses without a detailed spec or use cases." I will add that each solution comes at a price. OTOH I completely understand your "frustration" that there is no simpler way than creating tables especially to represent complex and nested data structures. But perhaps NoSQL is what you are looking for, see MongoDB::Tutorial. And there are all sort of modules which abstract the tedious parts of SQL away. e.g. DBIx::Class and SQL::Abstract - I am just mentioning those as further reading. I am no way near an expert for these. Other Monks are. bw, bliako	[reply] [d/l]