in reply to yet another thread question: is Symbol::gensym threadsafe?

To the best of my knowledge, IO::Socket::INET and all it's dependencies are thread-safe. I've written and tested several dozen threaded scripts using them and never encountered a problem that I haven't tracked down to elsewhere.

However, looking at the extract you posted which I've never had occasion to look at before, I'm not particularly impressed, and I do see a possible source of conflict/error.

If Symbol is loaded, as it normally will be, before the threads are spawned, then the package global $genseq will be cloned into each thread that is created. That means that each thread will have it's own sequence counter and will therefore be generating duplicate fully-qualified names for the globs it creates. That shouldn't be a problem as best I can tell as each sequence would live in a separate interpreter and should never come into conflict. But still it tweaks my radar.

It seems to me that it would be better to forgo the naming of the globs completely and just return the address of an anonymous glob created by localising a single well-known name:

sub gensym { return \do{ local *GLOB; *GLOB }; }

You could try temporarily substituting that for the code you posted and see if it makes any difference, but I seriously doubt it will. You've rather glossed over where I think the problem really lies. Ie.

the caching process simply doesn't receive any data at all - which i tried to bypass by locally undef'ing $/ and $\, and although i rather consider this as magic it apparently works - apart from that, sometimes the data get merged in ways that are plainly impossible. for instance chunks of data from one entity seem to appear somewhere else, where they don't belong to.

In particular, the caching process cannot be the reason for these mistakes,... smacks greatly of "famous last words".

I think if you posted (preferably cut down) working code that demonstrates how your caching process works, and resolved the problems described in the above paragraph, then you'd probably fix the overall problem also. I always, always suspect "magic" code :)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: yet another thread question: is Symbol::gensym threadsafe?
by grinder (Bishop) on Oct 24, 2007 at 07:19 UTC
    return the address of an anonymous glob

    I'd be vaguely uncomfortable with the idea that memory could be recycled and at some point you would generate the symbol name more than once.

    I would be inclined to extend the symbol name to include the current thread id:

    use Config (); sub gensym () { my $id = $Config{useithreads} ? threads->tid . '-' : ''; my $name = "GEN$t" . $genseq++; my $ref = \*{$genpkg . $name}; delete $$genpkg{$name}; $ref; }

    • another intruder with the mooring in the heart of the Perl

      But, as the names are immediately deleted from the symbol table each time:

      delete $$genpkg{$name};

      before the reference is returned to the caller, they can never again be referenced by name, so I don't think there is the possibility for confusion there. Hence, as the sequenced names are never used beyond the point of creation, there seems no reason for sequencing them in the first place when local can give us a new, unnamed glob whenever we want one?

      I'm guessing the current implementationis simply a part of history from before local was available--and definitely predating me.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        simply a part of history from before local was available

        No. local predates modules and so also predates the Symbol module.

        The generated name can be output when the file handle is dumped and it is nice to be able to distinguish between two file handles in such debug output. Hence unique names are generated.

        You also appear to have disproven your own argument that the numbers overlapping between threads might cause a problem, especially since your proposed solution to that problem is to eliminate the numbers. :)

        - tye        

Re^2: yet another thread question: is Symbol::gensym threadsafe?
by TOD (Friar) on Oct 24, 2007 at 09:48 UTC
    i really appreciate your offer for help, because this problem already starts to literally drive me mad...

    ok, first: there is an application global static hashtable named %appdata, located in a package called SOD::Config, which gets required by any of the processes taking part, and the hashtable is made accessible always by the mantra:
    our %appdata; *appdata = \%SOD::Config::appdata;
    in a frontend script, as being executed by ModPerl::Registry, you can find i.e. the following lines:
    my $planet = SOD::Planet->new($dbh, $some_id); my $stats = $planet->game_stats->record;
    SOD::Planet then says:
    $self->{GAMESTATS} ||= SOD::SharedObject::PlanetStatus->new($self->{ID +});
    the PlanetStatus constructor calls the constructor from its super class SOD::SharedObject with some individual arguments. The super class connects to the cache:
    sub new { my ($class, $type, $id) = @_; die "You must construct a SharedObject with a type" unless $type; my $self = { TYPE => $type, ID => $id, MODIFIED => 0, RECORD => {} }; bless $self, $class; # retrieve data from the shared objects cache: if ($id) { my ($sock, $req, $res); $sock = IO::Socket::INET->new ( PeerAddr => $appdata{SOBJIP}, # +127.0.0.1 PeerPort => $appdata{SOBJPORT}, +# 8000 Proto => 'tcp', Type => SOCK_STREAM, Reuse => 1) || die $!; $req = SOD::Session::Client::Request->new($appdata{SOBJCOMMAND}, + $appdata{SOBJCONTENT}); # SOBJCOMMAND == 6, SOBJCONTENT == 8 $res = SOD::Session::Client::Response->new($appdata{SOBJCOMMAND} +, $appdata{SOBJCONTENT}); unless ($req->send($sock, 'get ', { 'type' => $type, 'id' => $id })) { die SOD::Session::Error::get($req->error->get)."\t".$req->err +or->msg; $sock->close || die $!; } unless ($res->receive($sock)) { die SOD::Session::Error::get($res->error->get)."\t".$res->err +or->msg; } $sock->close || die $!; undef $sock; $self->{RECORD} = $res->content; } $self; }
    don't worry about the 'Session' in there, both daemons use the same API. if we put data on the cache the respective line sounds:
    unless ($req->send($sock, 'save ', { 'type' => $self->{TYPE}, 'id' => $self->{ID}, 'object' => $self->{RECORD} })) { [...]
    the real connection is performed by the ...Client::Request class:
    sub send { my $self = shift || return 0; return 0 unless ref $self; unless (@_ == 3) { $self->{ERROR}->set(ILL_NUM); return 0; } my $sock; ($sock, $self->{COMMAND}, $self->{CONTENT}) = @_; $self->{ERROR}->set(ILL_ARG1) if ref $self->{COMMAND}; $self->{ERROR}->set(ILL_ARG2) unless ref $self->{CONTENT}; return 0 if $self->{ERROR}->get; $self->{CONTENT} = freeze $self->{CONTENT}; # COMMANDSIZE is what SOBJCOMMAND was for the caller, # CONTENTSIZE refers to SOBJCONTENT in this example (just # a primitive protocol) my $str = sprintf("%".$self->{COMMANDSIZE}."s", $self->{COMMAND}) . sprintf("%0".$self->{CONTENTSIZE}."d", CORE::length($s +elf->{CONTENT})) . $self->{CONTENT}; local $\ = undef; # the magic unless ($sock->send($str)) { $self->{ERROR}->set(SOCK_SEND, $!); return 0; } 1; }
    so far for the client side, now for the server. SOD::Session::Server::Request receives the request:
    sub receive { my $self = shift; # Perl OO mantra + ... return undef unless ref $self; # .. continued unless (@_) { $self->{ERROR}->set(ILL_NUM); return 0; } my $sock = shift; local $/ = undef; # again the magic unless (defined ($sock->recv ($self->{COMMAND}, $self->{COMMANDSIZE +})) && defined ($sock->recv ($self->{CONTENT_LENGTH}, $self->{CONTENTS +IZE})) && defined ($sock->recv ($self->{CONTENT}, $self->{CONTENT_LENGTH} +))) { $self->{ERROR}->set(SOCK_RECV); return 0; } 1; }
    suposed, the client wants to put data on the cache, it sends a 'save' request, which for the server is a command:
    eval "_".$req->command;
    which for our example means that we call the _save() routine:
    sub _save { my $cont = $req->content; $logs->enqueue("Cacheing $cont->{'type'} $cont->{'id'}") if $DEBUG; my $time = $::server_time; if ($cont->{'type'} eq 'planet') { lock %planets; if (exists $planets{$cont->{'id'}}) { unless ($planets{$cont->{'id'}}->{'deleted'}) { my $old = thaw $planets{$cont->{'id'}}->{'object'}; $old = __merge($old, $cont->{'object'}); $planets{$cont->{'id'}}->{'object'} = freeze $old; $planets{$cont->{'id'}}->{'atime'} = $time; $planets{$cont->{'id'}}->{'modified'} = 1; } } else { # this is the point our last discussion was about: my %el : shared = ( 'object' => freeze($cont->{'object'}), 'atime' => $time, 'modified' => 1, 'deleted' => 0 ); $planets{$cont->{'id'}} = \%el; } } [...] }
    the bouncing heart of all that stuff is the merger, in-place and straightforward: i posted the snippets from the ...::Request classes only in order to make the example easier to understand. this Session::... API has been working for several years now, and with the session daemon i never ran into complications like these. that's why i assume that undef'ing the record separator variables really is magic. i mean, as the socket is a blessed symref, it could even be possible to binmode() it. but what for?
    --------------------------------
    masses are the opiate for religion.

      The first thing I notice is that you are assuming that when you do $sock->recv( $buf, 10 ), that provided the call doesn't return undef, then you will have received the number of bytes requested. I think that this is mistaken and that recv will fetch as many bytes as are available up to the number requested, but it may be less. For your short command packets this probably won't occur, but when you are sending large blocks of serialised content, even though you are sending it as a single chunk, the tcp protocol is free to break it up into a number of packets for transmission. That means that your code may be receiving partial transmissions.

      You aren't going to want to hear this, but all in all, your 'protocol' handling seems to be very sloppy, leaving numerous windows of opportunity for failure. At the very least you need to have your recv code check the number of bytes received and loop back for more if it didn't get enough. You should probably also be checking the number of bytes sent, though that's generally less of a concern.

      Possibly the best solution would be to ditch your (from your own comments:) "primitive protocol", and switch to using a known good implementation of a well tested protocol. From what I can see, HTTP GET and the lesser known PUT would meet most of your requirements.

      Whilst it may seem like a lot of work, I think switching your code to use HTTP::Daemon with it's long pedigree would probably be easier than trying to fix up your home-brew protocol implementation to cover all the possibilities and caveats that HTTP has already got covered and Gisle Aas has already implemented.

      You might also consider something like HTTP::Daemon::Threaded, though that is much less well proven and so something of an unknown quantity.

      Also, from my first glance at your _merge() routine, it looks to me that it is, at the very best, doing things the hard way in terms of walking the parallel structures to discover differences. I think it could be made considerably simpler. And faster. I also think I see possibilities for corruption, but I need to sit down and go through it again when I've got nothing else on the go. I'll get back to you on that.

      Sorry that this is rather negative, but you really do need to fix your protocol handling, and sooner rather than later.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        you are definitely right. this Session::... stuff is rather ancient code and has never been changed since, and the reason why it has always worked before might simply be that session objects don't become large. but planets and fleets data do, and i'm quite sure that's where the difficulties came from. i modified the respective parts as follows:
        sub receive { my $self = shift; # Perl OO mantra + ... return undef unless ref $self; # .. continued unless (@_) { $self->{ERROR}->set(ILL_NUM); return 0; } my $sock = shift; local $/ = undef; unless (defined $sock->recv($self->{COMMAND}, $self->{COMMANDSIZE}) + && defined $sock->recv($self->{CONTENT_LENGTH}, $self->{CONTEN +TSIZE})) { $self->{ERROR}->set(SOCK_RECV); return 0; } my $cont = ''; $self->{CONTENT} = ''; while (length $self->{CONTENT} < $self->{CONTENT_LENGTH}) { unless (defined $sock->recv($cont, $self->{CONTENT_LENGTH})) { $self->{ERROR}->set(SOCK_RECV); return 0; } $self->{CONTENT} .= $cont; $cont = ''; } 1; }
        and, although the loop doesn't seem to be necessary:
        sub send { my $self = shift; return undef unless ref $self; my $sock = shift || $self{ERROR}->set(ILL_NUM); return 0 if $self->error->get; $self->{CONTENT_LENGTH} = length $self->{CONTENT}; my $str = sprintf("%$self->{CODESIZE}s", $self->{CODE}) . sprintf("%$self->{CONTENTSIZE}d", $self->{CONTENT_LENGTH} +) . $self->{CONTENT}; my $length = length $str; local $\ = undef; my ($sent, $tsend) = (0, 0); while ($sent < $length) { unless ($tsent = $sock->send($str)) { $self->error->set(SOCK_SEND, $!); return 0; } $sent += $tsent; $tsent = 0; } 1; }
        i'm quite sure that this will fix the problem, and once again you nearly saved my life. :)
        --------------------------------
        masses are the opiate for religion.