I think I am one of those persons who rushed to perl threading. As I am from a background heavily used pthread, threading is part of my life, so the moment I saw threads and threads::shared, I braced it.
I am not saying that's a bad experience, and I will certainly keep trying it, as it gets improved. But I certainly started to reduce my effort on it and started to keep a little bit more distance. I think it is worth to share some of my thought and experience:
Once I started to become more fair with threaded and non-threaded solutions, I started to find better ones for my problems. And I just want to share one example here.
A while ago, I posted An internet garbage filter. After I posted it, actually I spent lots of effort to improve it, but I was never really satisfied. The performance was one of the big issues. Also by monitoring it closely, I found it is actually not that "multi-threadied". Threads are not getting near-equally responsed, and gives you too much feeling of single threading.
Now for the same application, I moved away from multi- threading, and actually gained much better performance. Surprisingly, the behavior of the code is more multi-threaded-alike, and I can clearly observe multiple HTTP requests are getting responsed "at the same time".
This is key piece that creates the multi-threaded-alike effect: (I added some inline comments)
while (1) { print "before accept\n"; if (my $browser = $browser_listener->accept()) { #I am still receiving new connections while waiting for respon +ses from existing connections my $req = recv_req($browser); my $remote = send_req($browser, $req); if ($remote) { $selector->add($remote); $pairs{$remote} = $browser; }; } print "after accept\n"; #with can_read, slow site is not blocking site with quick response +s. foreach my $remote ($selector->can_read(1)) { print "can read res from site " . inet_ntoa($remote->peeraddr( +)) . "\n"; my $res = read_res($remote);#this is just one chunk of respons +e, and we move to the next connection, so short responses are not blo +cked by big files if ($res) { send_res($pairs{$remote}, $res); } else { $selector->remove($remote); $remote->close(); $pairs{$remote}->close(); delete $pairs{$remote}; } } print "finished checking can read\n"; }
If you care the entire code, here it is:
use IO::Select; use IO::Socket; use strict; use warnings; my $banned_type = { "cab" => 1, "class" => 1, "dat" => 1, "exe" => 1, "gif" => 1, "ico" => 1, "jpg" => 1, #"js" => 1, #"jsp" => 1, "png" => 1, "swf" => 1 }; my $banned_site = { "ad.doubleclick.net" => 1, "bar.baidu.com" => 1, "lz.mainentrypoint.com" => 1, "s.abetterinternet.com" => 1, "search-itnow.com" => 1, "www.ftstock.com" => 1, "www.mainentrypoint.com" => 1, "www.mywebsearch.com" => 1, "www.newshub.com" => 1, "www.smileycentral.com" => 1 }; my $trusted_site = { "a248.e.akamai.net" => 1, "www.hsbc.ca" => 1 }; use constant RES_400 => "404 HTTP/1.1 Banned\r\n\r\n<html><body>Banned +</body></html>"; use constant FAILED_TO_CONNECT_REMOTE => "404 HTTP/1.1 Banned\r\n\r\n< +html><body>Failed to connect remote site %s</body></html>"; my $browser_listener = new IO::Socket::INET(Proto => "tcp", LocalAddr => "localhost", LocalPort => 8080, Reuse => 1, Timeout => 1, Listen => 10) || die "fail +ed to create browser listener"; my $selector = new IO::Select(); my %pairs = (); while (1) { print "before accept\n"; if (my $browser = $browser_listener->accept()) { my $req = recv_req($browser); my $remote = send_req($browser, $req); if ($remote) { $selector->add($remote); $pairs{$remote} = $browser; }; } print "after accept\n"; foreach my $remote ($selector->can_read(1)) { print "can read res from site " . inet_ntoa($remote->peeraddr( +)) . "\n"; my $res = read_res($remote); if ($res) { send_res($pairs{$remote}, $res); } else { $selector->remove($remote); $remote->close(); $pairs{$remote}->close(); delete $pairs{$remote}; } } print "finished checking can read\n"; } sub recv_req { my ($browser, $thread_id) = @_; my $content_length = 0; my $req = ""; while (1) { my $chunk; $browser->recv($chunk, 10000); if ($chunk =~ m/Content-Length: (\d*)/) { $content_length = $1; } $req .= $chunk; last if ($chunk =~ "\r\n\r\n"); } $req =~ /(.*?)\r\n\r\n(.*)/s; if (length($2) > 0) { $content_length -= length($2); print "after -, content_length = $content_length\n"; } while ($content_length > 0) { my $chunk; $browser->recv($chunk, $content_length); $req .= $chunk; $content_length -= length($chunk); } return $req; } sub send_req { my ($browser, $req) = @_; my $host = ($req =~ m/Host:\s*(.*?)\r/)[0]; my $remote; #some web site does not like repeat siet in get/post $req =~ s/$host//; #only once $req =~ s/http:\/\///; $req =~ m/.*?\s+(.*?)\s/; my $page = $1; if (is_trusted_site($host)) { print "rcvd [$host, $page], trusted\n"; if ($remote = new IO::Socket::INET(Proto => "tcp", PeerAddr => + $host, PeerPort => 80)) { $remote->send($req); } } else { if (is_banned_site($host) || is_banned_type($page)) { print "rcvd [$host, $page], banned\n"; print $browser RES_400; close($browser); } else { print "rcvd [$host, $page], not banned\n"; if ($remote = new IO::Socket::INET(Proto => "tcp", PeerAdd +r => $host, PeerPort => 80)) { $remote->send($req); } } } return $remote; } sub read_res { my $remote = shift; my $res; $remote->recv($res, 100000); if (!length($res)) { $res = undef; } return $res; } sub send_res { my ($browser, $res) = @_; $browser->send($res); } sub is_banned_site { my $site = shift; return 1 if (exists($banned_site->{$site})); if ($site =~ m/offeroptimizer/) { return 1; } if ($site =~ m/revenue.net$/) { return 1; } if ($site =~ m/popupsponsor.com$/) { return 1; } if ($site =~ m/hitbox.com$/) { return 1; } return 0; } sub is_trusted_site { my $site = shift; return 1 if (exists($trusted_site->{$site})); } sub is_banned_type { my $tmp = lc(shift); =document $tmp =~ m/(.*?)\:\/\/((?:(?:.*)\/)*)(.*)/; my $proto = $1; my $path = $2; my $file = $3; my $query; my $type; if ($file) { ($file, $query) = split(/\?/, $file); ($file, $type) = split(/\./, $file); } =cut if ($tmp ne "/") { $tmp = (split /\//, $tmp)[-1]; my $type = (split /\./, $tmp)[-1]; if ($type && exists($banned_type->{$type})) { return 1; } else { return 0; } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: My experience with Perl threading and the reason I think that I have rushed a little bit
by Aragorn (Curate) on Jan 07, 2004 at 11:01 UTC | |
|
Re: My experience with Perl threading and the reason I think that I have rushed a little bit
by jeffa (Bishop) on Jan 07, 2004 at 16:00 UTC | |
by pg (Canon) on Jan 07, 2004 at 16:14 UTC | |
|
Re: My experience with Perl threading and the reason I think that I have rushed a little bit
by perrin (Chancellor) on Jan 07, 2004 at 18:20 UTC | |
|
Re: My experience with Perl threading and the reason I think that I have rushed a little bit
by tilly (Archbishop) on Jan 11, 2004 at 02:08 UTC |