Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

WWW::Curl and making it thread safe

by lsuchocki (Novice)
on Feb 18, 2016 at 03:18 UTC ( [id://1155516]=perlquestion: print w/replies, xml ) Need Help??

lsuchocki has asked for the wisdom of the Perl Monks concerning the following question:

Oh Perl Monks, I plead for your insight..

My app uses WWW::Curl::Multi (libcurl) to perform web tasks which are handled, among many other things, in a super loop. I've recently added a piece of code which fork()s to perform an occasional long-running task, which I do not want to block my main loop with.

Works great. On Linux.

Upon testing the same code under Windows (Strawberry Perl) with the newly added code which forks, perl crashes. Debugging shows me if I don't fork(), it doesn't crash, and if I don't "WWW::Curl::Multi->new();" it doesn't crash.

Knowing that Windows emulates fork() with interpreter threads, I'm guessing the WWW::Curl library is not thread safe.

This was confirmed with the following bare-minimum test cases:
Windows (wine):

use WWW::Curl::Multi; my $x = WWW::Curl::Multi->new(); if (fork == 0) { #Do stuff NOT related to WWW::Curl here.. print STDERR "thread sleeping\n"; sleep 1; print STDERR "thread exiting\n"; exit; } print STDERR "main sleeping\n"; sleep 3; print STDERR "main exiting\n"; exit 0;

Which yields:
main sleeping thread sleeping thread exiting Free to wrong pool 249ea0 not 241ba8 at testcrash.pl line 9. wine: Unhandled page fault on write access to 0x00000000 at address 0x +713c9106 (thread 0031), starting debugger...

And under Linux:

use threads; use WWW::Curl::Easy; my $x = WWW::Curl::Easy->new(); my $t = threads->create(sub{ print STDERR "thread sleeping\n"; #Do stuff NOT related to WWW::Curl sleep 1; print STDERR "thread exiting\n"; }); print STDERR "main sleeping\n"; sleep 3; $t->join(); print STDERR "main exiting\n"; exit 0;

Which yields (under GDB):

[New Thread 0x7ffff1689700 (LWP 12664)] main sleeping thread sleeping thread exiting [Thread 0x7ffff1689700 (LWP 12664) exited] main exiting Program received signal SIGSEGV, Segmentation fault. 0x0000003f3fe39d63 in Curl_splay () from /lib64/libcurl.so.4

Now, I don't need any curl resources within my child/thread, it's only used in the main program (technically another .pm), but the object created is globally scoped and has a member with a WWW::Curl object. So I don't think I can keep it hidden from my child/thread. Curl itself is supposed to be thread-safe, with the exception of the curl_global_init. I'm pretty sure that's not the issue though, since the child does not call it. Unless it's being automatically called a second time by the BOOT: XS syntax on the thread creation?

I believe that when I fork under Win32/thread under Linux, my object may be free()d by the DESTORY XS methods when I exit/join the thread? And I probably need to prevent curl_global_cleanup() from being called when the thread exits in the .pm END{} block... ??

So... this is where it gets fuzzy for me. I'm a strong C programmer, but there's a learning curve with perl's XS. Are there any other modules out there which properly deal with these sort of situations which I may learn / copy from? I've spent all day reading perlthrtut, perlguts, perlmod, Example::CLONE and others and ended up with a nasty headache..

Most humbly,
--Luke

update:

Is the solution simply to (in reference to the above examples):

use threads; use threads::shared; ... share($x);

This seems to prevent perl from crashing, and the double DESTROY, however, with the fork() example under windows, the END{} curl_global_cleanup() is called twice, in the thread and in the parent, where in the thread->create()->join() it is only called once.....

Is this a safe workaround to what I'm seeing?

Replies are listed 'Best First'.
Re: WWW::Curl and making it thread safe
by marioroy (Prior) on Feb 18, 2016 at 11:36 UTC

    Greetings lsuchocki, and welcome to the Monastery.

    Try having the worker exit via threads->exit on Windows.

    use threads; use WWW::Curl::Multi; my $x = WWW::Curl::Multi->new(); if (fork == 0) { #Do stuff NOT related to WWW::Curl here.. print STDERR "thread sleeping\n"; sleep 1; print STDERR "thread exiting\n"; threads->exit(0) if ($^O eq 'MSWin32' && threads->can('exit')) +; exit(0); }


    If the issue persists, try exiting via POSIX::_exit. That avoids all END and destructor processing for non-threads spawned via fork.

    use POSIX (); use WWW::Curl::Multi; my $x = WWW::Curl::Multi->new(); if (fork == 0) { #Do stuff NOT related to WWW::Curl here.. print STDERR "thread sleeping\n"; sleep 1; print STDERR "thread exiting\n"; POSIX::_exit(0); }


    Regards, Mario

Re: WWW::Curl and making it thread safe
by Anonymous Monk on Feb 18, 2016 at 03:28 UTC
    Try forking/threading early before curl is loaded
    use threads; async sub { print join q/ /, threads->tid, "\n"; sleep 3; return; }; require WWW::Curl::Easy; ...; exit 0;

      Correctly me if I'm wrong, but in that case I would loose the ability to create a thread (or threads) "on-demand", with the actual data required for the thread to chew on.

      It would be a long-running thread where I would have to manage queues into that single thread, and either handle the data sequentially or thread again from there if I wanted multiple threads, no?

      At what point put my code into a different perl file, serialize my data into another and call it all with Win32::Process::Create ?

        I'd suggest the reverse of the anonymous monk: put WWW::Curl into a thread by itself, don't load it in the main program, and serialise everything through there. You might be able to get that to multi-thread if each thread that uses W::C were to load it individually, but I doubt it.

        I suspect there are better (read: simpler / less prone to headdesk interactions) ways to run multiple HTTP requests simultaneously than trying to get WWW::Curl to play nice with perl threads. An event-based callback (POE, AnyEvent/Coro, etc.) where you can queue up multiple requests at once springs to mind, or pushing the actual W::C calls into forked worker children (that you then get back using either event or thread-based parents). Personally, I would usually use AnyEvent and either LWP::UserAgent (with the AnyEvent "hack" to LWP to get it to play nice with AE) or AnyEvent::HTTP. YMMV.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1155516]
Approved by Tanktalus
Front-paged by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 03:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found