thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

First of all thanks for the time and effort to assist me with my problem.

I have written a short script that produces some data (with 1 sec period) and writes them on a text file.

Main script (producing data)

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Fcntl qw(:flock); # import LOCK_* and SEEK_END constants use Fcntl qw(SEEK_SET SEEK_CUR SEEK_END); #SEEK_SET=0 SEEK_CUR=1 ... my @doc_write; my @array; my $file = "test.txt"; sub add { open (DATA, "+<", $file) or die ("Could not open file: ".$file.". $!\n"); flock(DATA, LOCK_EX) or die "Could not lock '".$file."' - $!\n"; if (-z "".$file."") { print DATA "0\n"; } while( @doc_write = <DATA> ) { chomp @doc_write; seek(DATA , 0 , SEEK_SET) or die "Cannot seek - $!\n"; truncate(DATA, 0); my $range = 50; my $minimum = 100; my $random_number = int(rand($range)) + $minimum; my $time = time(); my $packet = join (' ' , $time , $random_number); push(@doc_write , $packet); print Dumper (\@doc_write); foreach $_ (@doc_write) { print DATA $_ . "\n"; } close (DATA) or die "Could not close '".$file."' - $!\n"; return $packet; } # End of While (<DATA>) } # End of sub add while (sleep 1) { my $losses = &add(); print "Added:" .$losses. "\n"; }

From the same text file I am requesting 3 other scripts to get the data simultaneously and process them.

Secondary script reading data process them and write the result to a comparison file. Same process occurs for all three secondary scripts.

while (sleep 1) { read all data into an array... extract last element... process it.. write to a secondary file the extracted data for comparison purposes.. }

I am using for all scripts flock as I am trying to avoid possibilities of colliding two process together (reading and writing) from the text file.

I noticed by comparing the text files that approximately every 9-10 seconds I am missing one instance when running two scripts together. More data losses in cases that I am running all 4 scripts together.

As far as I can understand the sleep 1 period produces most of the problems. Well I thought the time would be sufficient for all scripts to read and write on the file. But fortunately I also need to test the scripts on time intervals of 1 second.

So I am wondering, is there a way to verify that all scripts will process the text file before I push the new data? Or maybe someone has a better idea of completing this task in a different way I would be more than happy to hear about it.

Maybe my description is not well defined so please do not hesitate to ask for further details on the parts that are not clear enough.

Again thank you all for your time and effort.

  • Comment on Reading and writting data in a text file with several scripts simultaneously time synchronization problem
  • Select or Download Code

Replies are listed 'Best First'.
Re: Reading and writting data in a text file with several scripts simultaneously time synchronization problem
by boftx (Deacon) on May 18, 2014 at 23:28 UTC

    What you need is some sort of semaphore system. It would be simple to this with MemCache or a database, but I suspect you want to avoid such as those as well as named pipes or other IPC approaches.

    That said, if (and this is a very big if) you can assign individual script 'IDs' to the slaves, then you can take the approach that each slave has a unique ID such that you can use binary operations to extract the relevant ID bit from an int stored in a 'lock' file to determine if data is ready to read or if all slaves have finished reading.

    The logic would be something like this: if master, read lock file. If it exists, write new data only if the value of the lock file indicates that all slaves have read old data. If the file does not exist, write new data file and set lock file value to '0' to indicate data is ready. If slave, read lock file. If it does not exist, wait until it does. Use a bit-wise 'AND' (&) operation to determine if the slave bit is already set. If not, read data and then write lock file after bit-wise 'OR'ing (|) in the slaves bit. If the bit is already set, wait until lock file value for the slave ID bit in question is 0, indicating that new data is ready again. You will need to flock the lock file before every read, and be sure to read immediately before any write to get the current value.

    If the slaves are also writing the data file, then the above logic would need to be tweaked.

    I understand that this is a very crude approach, but the basic logic should be a start.

    Edit: I just realized that this presumes that the master knows how many slaves there are in order to determine if all of them have read the data file. Give me another couple of shots of Scotch and I can probably come up with a more general solution. Also, the slave doesn't look for a value of '0' in the lock file, but rather, that the bit corresponding to the slave ID is 0.

    It helps to remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.

      To: boftx

      Your approach makes a lot of sense. The Master (Main script) should be aware of how many slave scripts are attempting to extract the data. As a second step he should be aware of how many processes have all ready extra the data and how many how not yet, until the next iteration.

      I am afraid that by trying to create a such a process is a bit out of my league. Well I was also thinking the alternative approach that you proposed using a Data Base (e.g. MySQL). I think this approach will speed my process a lot for all the scripts.

      Another alternative that I am thinking also, is that instead of reading all the data of the file Main script into an array, I could simply use append. This approach will also significantly improve the process speed.

      But now I am facing two new challenges.

      Is there a way to measure the process time for the main script to read and write the data on the file? I am mainly curious because I want to compare this time with the MySQL process time and decide which path I should follow.

      The second and final challenge that I am facing, is there a way to apply flock on INI files? I will be pushing all files to use one INI file with different locations of reading of course but still there might be produced a conflict.

      Sorry for the continuous questions and messages I am brainstorming any possible solution. Since that kind of processes I will encounter several times in the future so I am looking for the best possible solution to implement.

        I would go with the database if that is desirable since it might well make future reporting easier to manage. But you could always use the Benchmark module to test out different methods.

        As for flock and INI files, I presume you mean a typical foo.ini file for config variables. I don't see why you would have any conflicts unless the file is remote mounted over a network. Some versions of flock might have problems with that. YMMV.

        The main issue is to devise a decent semaphore scheme that removes any dependence upon sleep() to synchronize the various parts other than establishing a timing loop for polling the semaphore.

        It helps to remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.
Re: Reading and writting data in a text file with several scripts simultaneously time synchronization problem
by QM (Parson) on May 19, 2014 at 15:34 UTC
    I hesitate to add this here, as perhaps you don't need the more complicated solution.

    I used to maintain a script that did reads/writes to a file (not a real database file, but similar use). Because of file locking issues across NFS, and the need to allow multiple readers and multiple writers, I grew my own file lock system.

    Each script instance wanting to read or write the "database" file would touch a file in a directory based on the name of the DB file. Touched filenames were chosen to sort lexicographically by timestamp, including the hostname and process number of the script, and whether it was requesting read or write access. For example, the touch filename template might be YYYYMMDDHHMMSS_read_hostname_ppppp. Any process wishing to read or write would first touch a filename appropriately, then read in and sort the existing filenames. If requesting a read lock, and all processes scheduled ahead of it are also reads, then start reading. If it's a write lock, wait until it is the oldest lock request. To release a lock, delete the file. This is a FIFO lock request queue.

    You may want some mechanism for timeouts, abnormal terminations, etc.

    There are also some subtle considerations for any locking system, so do some research before deploying a critical system.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Since he said he would like to use a DB, he can take full advantage of atomic inserts to implement the semaphore, thereby avoiding all flock issues by not using it. :)

      It helps to remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.

      To QM

      Sounds as a good solution, but unfortunately I do not feel comfortable doing something like that. I am not that advanced and I guess I will not be able to complete it, at least soon enough. But thanks a lot for your proposal is something that I could apply in future.

Re: Reading and writting data in a text file with several scripts simultaneously time synchronization problem
by Monk::Thomas (Friar) on May 19, 2014 at 15:14 UTC
    Can you give an explanation what you are trying to do? (an explanation for the actual problem, not the implemented solution) Maybe there's a totally different solution?

      To: Monk::Thomas

      I tried to include a short description on the previous comment, but for simplicity reasons I will pasted also here.

      The whole story is that I am creating a main test script that I need it to produce random data and store them into a file for simplicity reasons. The secondary scripts I want them to retrieve the data from the folder, process them and also store their output on separate files, for comparison reasons after. I want to compare how many data were able to process and how fast!

      In conclusion even the secondary scripts are test scripts I will choose only one of them to apply on my real goal. I am testing them to see performance under pressure in order to choose the best one.

      This is the reason that either I need to store the data to a folder for later retrieving them or into a database. The problem is also that on my real goal several scripts will need to access this folder to retrieve the data.

      So I am trying to make my experiments as realistic as possible.

      I need to compare the process times. Initially with the folders in different formations and possible create a short script with MySQL also just to observe if there is a huge difference on process time.

      P.S. Sorry for the long answer I tried to include everything.

        Maybe a totally different solution to your problem is to pipe the data. Let the main script generate the test input and then write it to 3 different pipes (or whatever the number of your testclients is). The test client would simply listen on StdIn and process any incoming data packet.

        This would neatly sidestep the whole issue of file locking and concurrence. However I really can't tell if this is a suitable solution for your problem even after reading your provided description.

        simplified code

        open my $fh1, '-|', 'bin/testclient1'; open my $fh2, '-|', 'bin/testclient2'; open my $fh3, '-|', 'bin/testclient3'; my $time1 = 0; my $time2 = 0; my $time3 = 0; while (<condition>) { my $testdata = make_foo(); my $time_a = time; print {$fh1} $testdata; my $time_b = time; print {$fh2} $testdata; my $time_c = time; print {$fh3} $testdata; my $time_d = time; $time1 += $time_b - $time_a; $time2 += $time_c - $time_b; $time3 += $time_d - $time_c; } print "method 1: $time1 seconds\n"; print "method 2: $time2 seconds\n"; print "method 3: $time3 seconds\n"; exit;
        P.S.: I always get '-|' and '|-' wrong. You definitely need to check the syntax.