Replacing Data::Dumper / do(file) on multi-fork process

delirium has asked for the wisdom of the Perl Monks concerning the following question:

A while back I wrote an FTP engine in Perl for my company's e-commerce group. The main goals were to make a system with good session logging, keep a history of file names and download times, separate code from logon credentials and filenames, and to be able to run multiple sessions simultaneously.

The modules I had available in addition to core modules were Net::FTP, Parallel::ForkManager (thank god), and IO::Scalar. There was no procedure for adding new modules to the production system, so I threw together a script using Net::FTP for the comm logic, Parallel::ForkManager for the multiple simultaneous session, and IO::Scalar to grab STDIN and STDERR from each fork to create my session logs.

My biggest problem turned out to be the session history. I had a hash of hashes, where each key was the name of a session profile, and its keys described the session history. I ended up using Data::Dumper to dump the hash to a file, a sample of which looks like this:

$VAR1 = {
  'Session 1' => {
    'last' => 1080139082,
    'files' => [
      'File1.txt 1079948073 @1079966101',
      'File2.txt 1080035083 @1080053101',
      'File3.txt 1080121051 @1080139081'
    ],
    'lastfailmsg' => '+',
    'lastfailtime' => 0
  },
  'Session 2' => {
    'last' => 1080129127,
    'files' => [
      'File1.txt @1079956803',
      'File2.txt @1080043204',
      'File3.txt @1080129100'
    ],
    'lastfailmsg' => '+',
    'lastfailtime' => 0
  }
};
[download]

With a little fiddling, I was able to use do(file) to read the hash back into memory. I was unhappy with this setup, but managed to make it work. Since each session was its own fork, I had a logistics problem in updating the history file. After a session finished, I copied the hash tree for just that session to a temporary hash, reloaded the history file into memory, merged the changes, and dumped the new hash back to file. I used some simple file locking to battle race conditions. It works well, despite my apprehension.

Thankfully, a procedure is now in place to install CPAN modules in the production environment. So now I want to replace the following code:

sub merge_hist_changes  {   # Merges current session's history changes
+ into %sess_hist.

    ## Step 1 - Create copy of $session's hash tree
    my %temp_hash = %{$sess_hist{$session}};

    ## Step 2 - Filter out files downloaded more than $hist_days days 
+ago
    my $old = time - ( 86400 * $hist_days );    # 86400 seconds in a d
+ay
    @{$temp_hash{files}} = grep { /@(\d+)$/; $1 > $old } @{$temp_hash{
+files}};
    @{$temp_hash{uploads}} = grep { /@(\d+)$/; $1 > $old } @{$temp_has
+h{uploads}};

    ## Step 3 - Get an flock on $hist_file.l. This is the critical ste
+p that prevents
    ## other forks from updating until the current $session's info get
+s updated.
    open HFL, '>', "$hist_file.l";
    unless(flock HFL, 2) {
        my $failstr = "Can't get lock on $hist_file.l, changes to DB u
+nsaved\n";
        $failstr .= "History tree for $session :\n" . Dumper \%temp_ha
+sh;
        &pager_alert($failstr);
        exit;
    }

    ## Step 4 - Get new %sess_hist from disk (like &parse_hist)
    local $/ = undef;
    if (-s $hist_file)  {
        unless( %sess_hist = %{do($hist_file)} ) {
            my $failstr = "Can't parse history file, changes to DB uns
+aved\n";
            $failstr .= "History tree for $session :\n" . Dumper \%tem
+p_hash;
            &pager_alert($failstr);
            exit;
        }
    }
    ## Step 5 - Change $session's hash pointer to refer to %temp_hash
    $sess_hist{$session} = \%temp_hash;

    ## Step 6 - Dump %sess_hist.
    local $Data::Dumper::Indent = 1;
    open  HF, '>', $hist_file;
    print HF Dumper \%sess_hist;
    close HF;
    close HFL;  # Releases flock and lets next child process update $h
+ist_file
[download]

...with a system that is more reliable. Currently, if a single write error occurs, I could potentially lose a day's worth of history information.

Right now I'm just looking at possible solutions. Storable and Tie::TwoLevelHash are options, as is restructuring the hash into something that could fit better into database tables and using DBI or something similar. What approach would you guys take?

Comment on Replacing Data::Dumper / do(file) on multi-fork process Select or Download Code

Replies are listed 'Best First'.
Re: Replacing Data::Dumper / do(file) on multi-fork process by amw1 (Friar) on Mar 24, 2004 at 19:51 UTC
If you don't care how easy it is for other languages to be able to access the data. (I don't know how easily it is to deal with Storable frozen data outside of perl) then what I've done is create a database table (whatever db you want) with an ID field, possibly some other meta information, and then a blob that holds the result of a Storable::freeze call. You'd then be able to do something like. . . sub SaveSession { # this could self generate the session id using # a db's autoincrament feature, or with uuids or # a different homebrewed method my $session_id = shift() my $session_struct = shift(); my $serialized_struct = Storable::freeze($session_struct); #pseudo sql insert into session_table (id,session) values ($session_id, $serialized_struct) handle errors return } sub GetSession { my $session_id = shift(); #pseudo sql select * from session_table where id = $session_id $row = sql result handle errors return Storable::thaw($row->{'session'}); } [download] You now have a reliable way to store and retrieve the datastructures you want so you can merege 'em together however you want them. If you didn't want to deal with a database you could replace the db calls with something that writes a file based on the session id you care about into a single directory. (waving hands about unique filenames, I'd look at uuid's to reliably name the files) You could then build a mechanisim that will open and thaw a file based on the id you pass it. Or open and thaw all files etc. Then you can do your merge in memory to display the combined results and not worry about having to continuously update the merged results.	[reply] [d/l]
Re: Replacing Data::Dumper / do(file) on multi-fork process by saintmike (Vicar) on Mar 24, 2004 at 19:18 UTC
If you're keeping all data of all sessions in one container, you need something as reliable as a database (including locking, crash recovery, backups, etc.) -- MySQL, PostgreSQL, even SQLite come to mind. Not sure how long you need to keep the session data around, but one different approach would be using Cache::File -- that's usually a great way to manage session data, using the session key as index, it'll use a different file for every session.	[reply]