Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am struggling to figure out how to use threads::shared variables, in this particular case Hash, amongst multiple threads. To make things as simple as I can right now I am not trying to run in threads but just code what I hope is the simplest of verifications. In this particular case I get a SQLite reference back from a connect but cannot figure out the correct reference/dereference syntax. In this particular example in the 3rd section of the code the SQLite prepare fails because the prior connect reference isn't the same, in my large use this would have a shared Hash of SQLite 'open' DB's (the connect references) which could be cycled through doing queries.

From the documentation on threads::shared I am 'sure' my lack of understanding is related to the comment "Shared variables can only store scalars, refs of shared variables, or refs of shared data (discussed in next section):" but I simply do not understand enough to figure out what to do.

Any help is appreciated.

Doug

the code example;

[root@itx2 ~]# cat x7.pl use DBI; use threads; use threads::shared; my $driver = 'SQLite'; my $database = ''; my $dsn = ''; my $userid = ''; my $password = ''; my $dbh; my $dsn; my $stmt; my $sth; my $rv; my @row; my %hash; my %hsh :shared; my $int :shared; my $String; my $Sdatabase; $Sdatabase = $ARGV[0]; $dsn = "DBI:$driver:dbname=$Sdatabase"; $dbh = DBI->connect($dsn, $userid, $password, { PrintError => 1, Ra +iseError => 1, AutoCommit => 0, ReadOnly => 1 }); $stmt = qq(select count ( * ) from asnipv4 ); $sth = $dbh->prepare( $stmt ); $rv = $sth->execute(); while( @row = $sth->fetchrow_array()) { $Iknt++; $String = join ',', @row; print "$Iknt -> $String \n"; } $dbh->disconnect(); print 'one' . "\n"; $dbh = DBI->connect($dsn, $userid, $password, { PrintError => 1, Ra +iseError => 1, AutoCommit => 0, ReadOnly => 1 }); $hash{$Sdatabase} = $dbh; $stmt = qq(select count ( * ) from asnipv4 ); $sth = $hash{$Sdatabase}->prepare( $stmt ); $rv = $sth->execute(); while( @row = $sth->fetchrow_array()) { $Iknt++; $String = join ',', @row; print "$Iknt -> $String \n"; } $hash{$Sdatabase}->disconnect(); delete $hash{$Sdatabase}; print 'two' . "\n"; $dbh = DBI->connect($dsn, $userid, $password, { PrintError => 1, Ra +iseError => 1, AutoCommit => 0, ReadOnly => 1 }); $hsh{"$Sdatabase"} = $dbh; $stmt = qq(select count ( * ) from asnipv4 ); $sth = $hsh{"$Sdatabase"}->prepare( $stmt ); $rv = $sth->execute(); while( @row = $sth->fetchrow_array()) { $Iknt++; $String = join ',', @row; print "$Iknt -> $String \n"; } $hsh{$Sdatabase}->disconnect(); delete $hsh{$Sdatabase}; print 'tre' . "\n"; exit; [root@itx2 ~]#

the execution failure;

[root@itx2 ~]# perl x7.pl /opt/asd/sql/dbi-indexed-GeoLite2.db-sqlite 1 -> 512468 one 2 -> 512468 two Invalid value for shared scalar at x7.pl line 60. [root@itx2 ~]#

Replies are listed 'Best First'.
Re: threads::shared referencing confusion
by ikegami (Patriarch) on Aug 20, 2024 at 19:19 UTC

    This is the failing line: $hsh{"$Sdatabase"} = $dbh;

    A database handle is a complex thing. It may reference file components and external library resources. They can't be shared. (The specific violation is that $dbh is a reference, but it doesn't reference something shared.)

    Create the database handle in the thread that will use it.

      yes, that is the SQLite error point but the actual error seems to me to have happened much earlier. While trying to get the original DB handle into a shared hash things clearly go wrong. The handle is changed to a reference and SQLite is clearly erroring on the reference as it is not the open DB handle. This is what I don't understand, how do I get the original handle into a shared Hash like I do in a non-shared Hash?

      SO show, at least thus far, the error isn't with SQLite a bit of code changed to demonstrate;

      use DBI; use threads; use threads::shared; my $driver = 'SQLite'; my $database = ''; my $dsn = ''; my $userid = ''; my $password = ''; my $dbh; my $dsn; my $stmt; my $sth; my $rv; my @row; my %hash; my %hsh :shared; my $int :shared; my $String; my $Sdatabase; $Sdatabase = $ARGV[0]; print 'Database fully qualified path of existing SQLite DB -> ' . $Sda +tabase . "\n"; $dsn = "DBI:$driver:dbname=$Sdatabase"; print 'Database data source name for connect -> ' . $dsn . "\n"; $dbh = DBI->connect($dsn, $userid, $password, { PrintError => 1, Ra +iseError => 1, AutoCommit => 0, ReadOnly => 1 }); print 'Simple database handle in $dbh from connect -> ' . $dbh . " and + DB is now open\n"; $hash{"$Sdatabase"} = $dbh; print 'Simple database handle in non-shared Hash, $Sdatabase -> ' . $S +database . ' and $hash{"$Sdatabase"} -> ' . $hash{"$Sdatabase"} . "\ +n"; $stmt = qq(select count ( * ) from asnipv4 ); $sth = $hash{"$Sdatabase"}->prepare( $stmt ); $rv = $sth->execute(); while( @row = $sth->fetchrow_array()) { $Iknt++; $String = join ',', @row; print "$Iknt -> $String \n"; } $hash{$Sdatabase}->disconnect(); delete $hash{$Sdatabase}; # $int = $dbh; #print 'Try to get simple database handle in shared scalar $int as -> + ' . $int . "\n"; $int = shared_clone(\$dbh); print 'Try to get simple database handle in shared scalar $int via sha +red_clone(\$dbh) -> ' . $int . " <- the original handle is gone and n +ow we have a reference\n"; $hash{"$Sdatabase"} = $int; print 'Try to get shared scalar $int database handle into a non-shared + Hash $Sdatabase -> ' . $Sdatabase . ' and $hash{"$Sdatabase"} -> ' +. $hash{"$Sdatabase"} . " <- works but we have a reference and not th +e handle\n"; $hsh{"$Sdatabase"} = $int; print 'Try to get shared scalar $int database handle into a shared Has +h $Sdatabase -> ' . $Sdatabase . ' has $hsh{"$Sdatabase"} -> ' . $hsh +{"$Sdatabase"} . " <- BUT the simple = results in some other referenc +e\n"; $stmt = qq(select count ( * ) from asnipv4 ); $sth = $hsh{"$Sdatabase"}->prepare( $stmt ); $rv = $sth->execute(); while( @row = $sth->fetchrow_array()) { $Iknt++; $String = join ',', @row; print "$Iknt -> $String \n"; } $hsh{$Sdatabase}->disconnect(); delete $hsh{$Sdatabase}; exit;

      yields;

      [root@itx2 ~]# perl x7.pl /opt/asd/sql/dbi-indexed-GeoLite2.db-sqlite- +old Database fully qualified path of existing SQLite DB -> /opt/asd/sql/db +i-indexed-GeoLite2.db-sqlite-old Database data source name for connect -> DBI:SQLite:dbname=/opt/asd/sq +l/dbi-indexed-GeoLite2.db-sqlite-old Simple database handle in $dbh from connect -> DBI::db=HASH(0x561e4b5e +5480) and DB is now open Simple database handle in non-shared Hash, $Sdatabase -> /opt/asd/sql/ +dbi-indexed-GeoLite2.db-sqlite-old and $hash{"$Sdatabase"} -> DBI::d +b=HASH(0x561e4b5e5480) 1 -> 511665 Try to get simple database handle in shared scalar $int via shared_clo +ne(\$dbh) -> REF(0x561e4b2b9570) <- the original handle is gone and n +ow we have a reference Try to get shared scalar $int database handle into a non-shared Hash $ +Sdatabase -> /opt/asd/sql/dbi-indexed-GeoLite2.db-sqlite-old and $ha +sh{"$Sdatabase"} -> REF(0x561e4b2b9570) <- works but we have a refere +nce and not the handle Try to get shared scalar $int database handle into a shared Hash $Sdat +abase -> /opt/asd/sql/dbi-indexed-GeoLite2.db-sqlite-old has $hsh{"$S +database"} -> REF(0x561e4b5e5630) <- BUT the simple = results in some + other reference Can't call method "prepare" on unblessed reference at x7.pl line 66. [root@itx2 ~]#

      It seems to me the problem is clearly that the shared Hash isn't getting the DB handle and I cannot figure out from the documentation what syntax is needed to get something 'outside the shared world', i.e. the SQLite DB handle, 'into the shared world'.

      Once I get past that point then the prepare at line 66 will either work or still fail.

      On a separate note you point on the DB handle is well taken. There won't actually be any concurrent DB handle access between threads as I implement a shared spin-lock mechanism to signal whether or not one of the threads is operating on the DB handle and only the thread that acquires the spin-lock operates on that DB handle (of course releasing the spin-lock when it is done). So far in 'many' threads not using sharing (and thusly each actually having their own unique local in scope to the subroutine that does the work) I still use the spin-lock mechanism to single single access per DB handle to achieve non-concurrent actions (select at this time but insert or replace code is done but not yet being tested/verified because I am chasing the sharing of a single DB handle Hash error right now). Other mechanisms work similarily to signal a DB handle should-be/was released, LRU to allow a DB handle not accessed 'recently' (some other code doing housekeeping on open handles) being eligible for closure, etc which allows me to control how many handles can be open at any given time and being driven by LRU to handle opened/needing-open. Why all this? 1) to remove redundant/repetative DB open/close (always open == ready to be worked on right now), 2) many threads buring through many open handles to achieve higher query (and later insert or replace) performance. There are actual time-series DB's holding a range of 'rows' but I have all the partitioning code working which limits the size of a time-series DB chunk/partition, i.e. SQLite DB's do not grow in size in an uncontrolled way (they are time range and size limited (partitioned) (in some simple to think about way like an Oracle RDBMS with field based partitioning), and the various code based that add-replace/delete/query all find their way throught the range of partitions they need to satisfy their function add-replace/delete/query...

      So, yes your caution is well taken and having 'worried' about it early on I am working to 'code around' the lack of SQLite concurrency (while not complaining that SQLite should have some sort of concurrency).

      Code and paragraph tags added by Grandfather. See Markup in the Monastery

        "lack of SQLite concurrency" - SQLite has plenty of concurrency for reading see the docs: SQLite Threading Modes. I would prototype your testing app with the default Serialization mode and see what performance results.

        I ran a short benchmark to see how fast the open/close function is. On my > 10-year-old machine, more than 2,000 per second. Your machine is probably much faster than mine.

        You should be aware that a prepare operation can be slow. If you are stashing DBhandles, then for performance reasons, you might need to stash the prepared SQL queries for at least common situations. Note prepare is table specific.

        I was shocked at this spin-lock idea. That is almost certainly not what you should do. Here is a description from Intel spin-locks harmful. In the first instance don't worry about this and let SQLite serialize things and see how fast it is. Then probably for multi-thread in your scheme, you need flock mechanism.

        How big is your DB? You should be aware that it is possible to dynamically manage the amount of memory that SQLite uses. I have tested this and actually used it in one of my applications. I was creating a heavily indexed DB. I inserted all the rows, then ran RAM usage up and did the indexing en masse for a huge performance gain. If all or most of the DB is memory resident, this cuts down on disk access. You could experiment with different sizes and let SQLite decide what to put into memory and when. Of course, once your app becomes more memory and CPU-bound, it won't make much difference to have more threads than you have physical cores.

        My advice is to do some benchmarking and then let that guide the design decisions.

        use strict; use warnings; use Time::HiRes qw( gettimeofday ); use DBI; my $dbfile = "testDB.sqlite"; my $start = gettimeofday; for (0..10000){ my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",{RaiseErro +r => 1}) or die "Couldn't connect to database: " . DBI->errstr; $dbh->disconnect; } my $end=gettimeofday; print "",($end-$start),"\n"; #4.54577684402466 # so approximately 2,000 per second

        It seems to me the problem is clearly that the shared Hash isn't getting the DB handle

        Yup. Just like I said. You can't share a DB handle, so you can't place it in a shared variable. On top of explaining this, I told you what you must do.

        On a separate note you point on the DB handle is well taken. There won't actually be any concurrent DB handle access [...]

        Apparently not, since I didn't talk about concurrency at all! So I didn't make whatever point you think I made about concurrency.

        I cannot read your code - you need code tags for formatting.

        If I understand ikegami correctly, you can only use a DB handle in the thread that created it. Regardless of any concurrent operations.

        Perhaps you can back up and explain your application a bit better? Also be aware that write performance of SQLite is very slow in a multi-writer scenario because the writer must obtain an exclusive lock on the DB file. Concurrent reads from multiple threads is fine. Each thread must have its own DB handle.

        Update: I would recommend that you come up with a simple model of what you expect your application to be doing. You can fiddle with this to run various benchmarks, etc. I would let the DB handle all of the file locking. There is no need for you to serialize the requests yourself. Again, I am a bit lost because I don't understand your app.

        This needs <code> tags around the code here, which is unreadable currently.