I think I am one of those persons who rushed to perl threading. As I am from a background heavily used pthread, threading is part of my life, so the moment I saw threads and threads::shared, I braced it.

I am not saying that's a bad experience, and I will certainly keep trying it, as it gets improved. But I certainly started to reduce my effort on it and started to keep a little bit more distance. I think it is worth to share some of my thought and experience:

Once I started to become more fair with threaded and non-threaded solutions, I started to find better ones for my problems. And I just want to share one example here.

A while ago, I posted An internet garbage filter. After I posted it, actually I spent lots of effort to improve it, but I was never really satisfied. The performance was one of the big issues. Also by monitoring it closely, I found it is actually not that "multi-threadied". Threads are not getting near-equally responsed, and gives you too much feeling of single threading.

Now for the same application, I moved away from multi- threading, and actually gained much better performance. Surprisingly, the behavior of the code is more multi-threaded-alike, and I can clearly observe multiple HTTP requests are getting responsed "at the same time".

This is key piece that creates the multi-threaded-alike effect: (I added some inline comments)

while (1) { print "before accept\n"; if (my $browser = $browser_listener->accept()) { #I am still receiving new connections while waiting for respon +ses from existing connections my $req = recv_req($browser); my $remote = send_req($browser, $req); if ($remote) { $selector->add($remote); $pairs{$remote} = $browser; }; } print "after accept\n"; #with can_read, slow site is not blocking site with quick response +s. foreach my $remote ($selector->can_read(1)) { print "can read res from site " . inet_ntoa($remote->peeraddr( +)) . "\n"; my $res = read_res($remote);#this is just one chunk of respons +e, and we move to the next connection, so short responses are not blo +cked by big files if ($res) { send_res($pairs{$remote}, $res); } else { $selector->remove($remote); $remote->close(); $pairs{$remote}->close(); delete $pairs{$remote}; } } print "finished checking can read\n"; }

If you care the entire code, here it is:

use IO::Select; use IO::Socket; use strict; use warnings; my $banned_type = { "cab" => 1, "class" => 1, "dat" => 1, "exe" => 1, "gif" => 1, "ico" => 1, "jpg" => 1, #"js" => 1, #"jsp" => 1, "png" => 1, "swf" => 1 }; my $banned_site = { "ad.doubleclick.net" => 1, "bar.baidu.com" => 1, "lz.mainentrypoint.com" => 1, "s.abetterinternet.com" => 1, "search-itnow.com" => 1, "www.ftstock.com" => 1, "www.mainentrypoint.com" => 1, "www.mywebsearch.com" => 1, "www.newshub.com" => 1, "www.smileycentral.com" => 1 }; my $trusted_site = { "a248.e.akamai.net" => 1, "www.hsbc.ca" => 1 }; use constant RES_400 => "404 HTTP/1.1 Banned\r\n\r\n<html><body>Banned +</body></html>"; use constant FAILED_TO_CONNECT_REMOTE => "404 HTTP/1.1 Banned\r\n\r\n< +html><body>Failed to connect remote site %s</body></html>"; my $browser_listener = new IO::Socket::INET(Proto => "tcp", LocalAddr => "localhost", LocalPort => 8080, Reuse => 1, Timeout => 1, Listen => 10) || die "fail +ed to create browser listener"; my $selector = new IO::Select(); my %pairs = (); while (1) { print "before accept\n"; if (my $browser = $browser_listener->accept()) { my $req = recv_req($browser); my $remote = send_req($browser, $req); if ($remote) { $selector->add($remote); $pairs{$remote} = $browser; }; } print "after accept\n"; foreach my $remote ($selector->can_read(1)) { print "can read res from site " . inet_ntoa($remote->peeraddr( +)) . "\n"; my $res = read_res($remote); if ($res) { send_res($pairs{$remote}, $res); } else { $selector->remove($remote); $remote->close(); $pairs{$remote}->close(); delete $pairs{$remote}; } } print "finished checking can read\n"; } sub recv_req { my ($browser, $thread_id) = @_; my $content_length = 0; my $req = ""; while (1) { my $chunk; $browser->recv($chunk, 10000); if ($chunk =~ m/Content-Length: (\d*)/) { $content_length = $1; } $req .= $chunk; last if ($chunk =~ "\r\n\r\n"); } $req =~ /(.*?)\r\n\r\n(.*)/s; if (length($2) > 0) { $content_length -= length($2); print "after -, content_length = $content_length\n"; } while ($content_length > 0) { my $chunk; $browser->recv($chunk, $content_length); $req .= $chunk; $content_length -= length($chunk); } return $req; } sub send_req { my ($browser, $req) = @_; my $host = ($req =~ m/Host:\s*(.*?)\r/)[0]; my $remote; #some web site does not like repeat siet in get/post $req =~ s/$host//; #only once $req =~ s/http:\/\///; $req =~ m/.*?\s+(.*?)\s/; my $page = $1; if (is_trusted_site($host)) { print "rcvd [$host, $page], trusted\n"; if ($remote = new IO::Socket::INET(Proto => "tcp", PeerAddr => + $host, PeerPort => 80)) { $remote->send($req); } } else { if (is_banned_site($host) || is_banned_type($page)) { print "rcvd [$host, $page], banned\n"; print $browser RES_400; close($browser); } else { print "rcvd [$host, $page], not banned\n"; if ($remote = new IO::Socket::INET(Proto => "tcp", PeerAdd +r => $host, PeerPort => 80)) { $remote->send($req); } } } return $remote; } sub read_res { my $remote = shift; my $res; $remote->recv($res, 100000); if (!length($res)) { $res = undef; } return $res; } sub send_res { my ($browser, $res) = @_; $browser->send($res); } sub is_banned_site { my $site = shift; return 1 if (exists($banned_site->{$site})); if ($site =~ m/offeroptimizer/) { return 1; } if ($site =~ m/revenue.net$/) { return 1; } if ($site =~ m/popupsponsor.com$/) { return 1; } if ($site =~ m/hitbox.com$/) { return 1; } return 0; } sub is_trusted_site { my $site = shift; return 1 if (exists($trusted_site->{$site})); } sub is_banned_type { my $tmp = lc(shift); =document $tmp =~ m/(.*?)\:\/\/((?:(?:.*)\/)*)(.*)/; my $proto = $1; my $path = $2; my $file = $3; my $query; my $type; if ($file) { ($file, $query) = split(/\?/, $file); ($file, $type) = split(/\./, $file); } =cut if ($tmp ne "/") { $tmp = (split /\//, $tmp)[-1]; my $type = (split /\./, $tmp)[-1]; if ($type && exists($banned_type->{$type})) { return 1; } else { return 0; } } }
  • Comment on My experience with Perl threading and the reason I think that I have rushed a little bit
  • Select or Download Code

Replies are listed 'Best First'.
Re: My experience with Perl threading and the reason I think that I have rushed a little bit
by Aragorn (Curate) on Jan 07, 2004 at 11:01 UTC
    Using the select system call (which is what IO::Select does) is a tried and true technique for writing servers or other applications which need to multiplex I/O. The late W. Richard Stevens does an excellent job explaining this and other techniques in chapter 6.

    Arjen

Re: My experience with Perl threading and the reason I think that I have rushed a little bit
by jeffa (Bishop) on Jan 07, 2004 at 16:00 UTC

    I concur with aragorn, solve this problem with select rather than threads. Question: How many processors does the computer you run your thread code on have? If it's just one, then the only way your code can achieve speed-up is if there is potential waiting due to I/O.

    I too have experience with writing real threads with C code and the pthread.h library (see Solved N-Queens (warning: C code ahead). My computer at home is Dual processor and the computers i used at school had Dual and Quad processors. We saw true speed-up, even the occasional super-linear speedup when we threaded our programs. I think that a lot of people think that by threading your program, it just magically "speeds up." This is simply not true, especially if they are only using a One Proc Box (apologies to Sammy Hagar). Speed up happens on a One Proc Box when processes have to wait on things like on I/O, and select is a good (yet hard to understand) solution for this.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      I only have one CPU, so I was not expecting any performance improvement. Instead I expected to lose some performance as all those swapping going on, however because Perl's thread is very "heavy", the downgrade is bigger than one's experience.

      The real thing I expected was, the near-equal response to multiple HTTP messages, but sadly that didn't happen. My guess is that it has something to do with Perl's current scheduling model.

      With select() and some careful thinking, I get much better results.

      Yeah, you are right.

Re: My experience with Perl threading and the reason I think that I have rushed a little bit
by perrin (Chancellor) on Jan 07, 2004 at 18:20 UTC
    I just want to say thanks for being a pioneer and sharing your experiences. I don't have the opportunity to mess with threads much, but need to know how they are coming along, so reports from people like you are very valuable to me.
Re: My experience with Perl threading and the reason I think that I have rushed a little bit
by tilly (Archbishop) on Jan 11, 2004 at 02:08 UTC
    Thank you for this follow-up. I'll update my opinion to say that threading is less usable in Perl 5.8 than I'd hoped.

    If you are interested in the long-term future of threading in Perl, I'd recommend that you get involved in the Parrot effort which I understand is currently trying to nail down what its threading model will be.