However, since TCP sockets that really receive their data in
packets look like any pipe, it could be cool to do it so
that I could define a regexp, and after a while of munching
I would have a match and socket in such state that reading
from it would give me the area of $' ($' itself would
obviously be undefined...). I even imagine this
could be technically possible, at least if regexp is in
maximal non-greedy mode (so it wouldn't need to go further
down the socket just to see if longer match was possible).
However, I don't see a way to do this with current perl.
Is there? Should somebody code it?
Currently if I want to do linefree regexp on socket, I have
to
undef $/;
$slurp=<SOCKET>;
$slurp =~ m/wha+t\n?\s*ever/;
to do that. That kinda pisses me off, because then I'll have
to emulate the pipe behaviour
$s=(length $`)+(length $&);
substr($slurp,$s) =~ m/post\s*match/;
to do what I want, plus I have to download more than I actually
needed. (I'm not quite sure about that) | [reply] [d/l] [select] |
I don't know what you mean by "download more than I need" since whatever you read in on the socket descriptor is already in your computer's socket buffer. The penalty for reading in data as 40 Kb vs. 400 Kb. is negligible. If you know that the current packet will or will not contain these four characters, then the search will reveal it. When you do $data=<SOCKET>, that does not send a request to the host for data, it snatches the data that's already in the buffer. If you wanted simulated network pipes, then you would have to read in one byte at a time after sending a request for a byte, checking and checking for a match, obviously very inefficient (like TCP telnet). Mark_Dominus solution is OK but it certainly doesn't behave any differently than reading in the data buffer, stopping where the regex matches, and returning the rest. Your best bet is NOT trying to force a "fluid pipe" behavior on something that will come in chunks which should be self-descriptive (the fact that a chunk/packet is waiting often enough to determine what its purpose is.) Snatch what you need and discard the rest is the ONLY solution.
AgentM Systems nor Nasca Enterprises nor
Bone::Easy nor Macperl is responsible for the
comments made by
AgentM. Remember, you can build any logical system with NOR.
| [reply] |
soka. hmm. but... I tested this once; program did two HTTP
requests, with one that closed the socket after it found what
it wanted from response headers, other read it all through.
The first one was faster, so I assumed that somehow killing
the socket saved me time. Guess it was just coincidence,
I didn't really repeat the test, it was only a side effect.
Now, forgive my ignorance on TCP issues (I did tutorial on
it but it mostly dealed with ACK/SYN ping-pong), but how does
it decide when to download, then? When I open a socket, I
automatically start sucking in data to the buffer? So if I
opened a socket to connection that would just keep feeding
and feeding data, and left the socket alone, I'd get buffer
overflow eventually?-)
| [reply] |