comment on

Folks, I'm looking for suggestions on how I might improve the efficiency of a program I use which does non-blocking HTTP io with often 1000+ open sockets. The central action of the program is characterized by the simplified code snipit which follows later. My thanks to liverpole for reminding me of the Perl module IO::Select which, although I had previously used, did not in this code which I inherited from the original code author. I suspect that much time is taken by checking on socket availability too often. I hope that there is a method to limit my calls to IO::Select:can_read so they are only done only when there is pending IO. I have been unsuccessful in finding such a mechanism. Is there any mechanism to implement the following pseudocode more efficiently than just calling IO::Select's can_read() every time one needs to check if any socket io is pending?

$SIG{INTERUPT_ON_PENDING_SOCKET_IO} = \&ckSockets;
[download]

Another optimization possibility

Even though we may have 1000+ open sockets the activity at any one time is sparse. I've been speculating about going back to the bit vector version of select and looking at the 1000+ bit length vector 32 at a time. I'm not optimistic about this approach. For all I know, the implementer of IO::Select may already do this. Highly simplified version of my current code

use IO::Select;
...
# Check for and process any pending socket input
# avoid steping on toes by keeping running list of ready
# sockets and process it untill empty
sub ckSockets {           # Returns: # of ready so we can tell activit
+y
    ...
    my @breadys = $io_select_obj->can_read(0);
    foreach my $fd_key (@breadys) {
        ...read and process data from this socket...
}
[download]

Looking at the top of profiled run below we see that the time appears dominated by calls to the socket testing:

[root@ibm-blade-blade0 testbuddy]# time dprofpp
Total Elapsed Time = 1790.060 Seconds
  User+System Time = 1315.990 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 55.9   736.8 753.00 317833   0.0002 0.0002  IO::Select::can_read
 9.05   119.1 256.27 363021   0.0000 0.0001  BuddyUsers::log
 5.25   69.14 137.12 350999   0.0000 0.0000  tsprint::ts
 5.17   67.97 67.979 350999   0.0000 0.0000  POSIX::strftime
[download]

Update: Updated to correct spelling on IO::Select, add '0' to can_read to reflect actual code.

In reply to Socket IO with large (>1000) numbers of open sockets by Ray Smith

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.