RE: Re: reading from sockets

However, since TCP sockets that really receive their data in packets look like any pipe, it could be cool to do it so that I could define a regexp, and after a while of munching I would have a match and socket in such state that reading from it would give me the area of $' ($' itself would obviously be undefined...). I even imagine this could be technically possible, at least if regexp is in maximal non-greedy mode (so it wouldn't need to go further down the socket just to see if longer match was possible). However, I don't see a way to do this with current perl. Is there? Should somebody code it?

Currently if I want to do linefree regexp on socket, I have to

undef $/;
$slurp=<SOCKET>;
$slurp =~ m/wha+t\n?\s*ever/;
[download]

to do that. That kinda pisses me off, because then I'll have to emulate the pipe behaviour

$s=(length $`)+(length $&);
substr($slurp,$s) =~ m/post\s*match/;
[download]

to do what I want, plus I have to download more than I actually needed. (I'm not quite sure about that)

Comment on RE: Re: reading from sockets Select or Download Code

Replies are listed 'Best First'.
RE: RE: Re: reading from sockets by AgentM (Curate) on Nov 13, 2000 at 00:50 UTC
I don't know what you mean by "download more than I need" since whatever you read in on the socket descriptor is already in your computer's socket buffer. The penalty for reading in data as 40 Kb vs. 400 Kb. is negligible. If you know that the current packet will or will not contain these four characters, then the search will reveal it. When you do $data=<SOCKET>, that does not send a request to the host for data, it snatches the data that's already in the buffer. If you wanted simulated network pipes, then you would have to read in one byte at a time after sending a request for a byte, checking and checking for a match, obviously very inefficient (like TCP telnet). Mark_Dominus solution is OK but it certainly doesn't behave any differently than reading in the data buffer, stopping where the regex matches, and returning the rest. Your best bet is NOT trying to force a "fluid pipe" behavior on something that will come in chunks which should be self-descriptive (the fact that a chunk/packet is waiting often enough to determine what its purpose is.) Snatch what you need and discard the rest is the ONLY solution. AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.	[reply]
RE: RE: RE: Re: reading from sockets by kaatunut (Scribe) on Nov 13, 2000 at 01:55 UTC
soka. hmm. but... I tested this once; program did two HTTP requests, with one that closed the socket after it found what it wanted from response headers, other read it all through. The first one was faster, so I assumed that somehow killing the socket saved me time. Guess it was just coincidence, I didn't really repeat the test, it was only a side effect. Now, forgive my ignorance on TCP issues (I did tutorial on it but it mostly dealed with ACK/SYN ping-pong), but how does it decide when to download, then? When I open a socket, I automatically start sucking in data to the buffer? So if I opened a socket to connection that would just keep feeding and feeding data, and left the socket alone, I'd get buffer overflow eventually?-)	[reply]
RE: RE: RE: RE: Re: reading from sockets by AgentM (Curate) on Nov 13, 2000 at 02:11 UTC
The deal is this. Every socket has a buffer of arbitrary size (potentially able to overflow). When you open up a TCP connection you probably decide on some protocol layer to cover over TCP, perhaps HTTP, FTP, etc. This layer will determine what the server sees (or wants to see anyway) and what the client sees. Once the ACK handshake is established, it is entirely up to the protocol to decide what happens next. For all you know, the client could send some info, wait for the server to process it and send it back- like echo servers. If the server takes alot of time to process it, than the only thing the client can do is wait- perhaps with a UNX select. Your problem is that you are imagining a socket as an ever-flowing pipe, which is not true at all. Yes, TCP provides options for LINGERing and SYN-checking, but otherwise, data retrieval can be erratic or spontaneous. With TCP (a connection-oriented protocol), things come in in certain amounts of data called packets (which involves an IP layer, a TCP layer, padding, and other junk). I have an old book that tells me most UNX sockets buffers are at least a few Mb. Nowadays, I'm sure they're larger, especially with the onslaught of 100Mb/s stuff. If you process quickly enough, you're unlikely to ever encounter a buffer overflow, but if you do, the server is notified by TCP that the packet was not received (since the buffer is full) so the server will attempt to send it later, up to some arbitrary amount of time. That's TCP! You don't even have to worry about it! When you read in <SOCKET>, youmay be reading in one or more packets- or you may block, since the buffer is empty. You can pretty much read in the socket buffer as you wish, without worrying about details such as setsockopt, etc. When you initially open a socket, it's empty. Only when a confirmed and triple-checked TCP packet is received is an ACK sent and the actual data part of the packet returned to your perl script via the buffer. Yes, if you close the connection after the data you have is received, nothing lethal happens. This is perfectly legal and an easy way to speed up the program. The TCP layer will alert the server that the connection has been closed (in TCP, you can half-close connections- but that's a different story) and the server won't bother to send anything more. In this case, it's just like reading a file: read as much as you need and scrap the rest. Why would you even need to read the rest? So, closing the socket when done with it, is perfectly fine. AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.	[reply]

AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.

AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.