$CBAS has asked for the wisdom of the Perl Monks concerning the following question:

I'm a little, no, I'm very tired and I can't figure out a good/nice/clean/legal way to read from a socket (INET) until it matches a certain string (4 characters to be correct).
I'd do byte-per-byte reading if it didn't suck so much but is there really no efficient way to do this type of thing?
I was sorta thinking of changing the default EOL character but I couldn't find how to do that (if it's even possible) ...

Anyone out there with some help for a poor kid like me?

thanks,
CBAS

Replies are listed 'Best First'.
Re: reading from sockets
by Dominus (Parson) on Nov 12, 2000 at 06:31 UTC
    Try this:

    { local $/ = "abcd"; $data = <SOCKET>; }
    This should read data from the socket until it sees "abcd" and then it will stop.
Re: reading from sockets
by AgentM (Curate) on Nov 12, 2000 at 06:29 UTC
    You can never know ahead of time what will come in on a socket. The socket is buffered by the OS drivers and the only way to read "up until a point" is to read in data in segments (packets), find the point you want and truncate the rest (or ignore it). Hopefully, you are using TCP (vs. UDP) so the packet will arrive more likely intact. Read the packet in (you should know it's size perhaps with a peek) and split(/1234/,$buffer,1); on the data. The returned data is the data is relevant to you.
    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
      However, since TCP sockets that really receive their data in packets look like any pipe, it could be cool to do it so that I could define a regexp, and after a while of munching I would have a match and socket in such state that reading from it would give me the area of $' ($' itself would obviously be undefined...). I even imagine this could be technically possible, at least if regexp is in maximal non-greedy mode (so it wouldn't need to go further down the socket just to see if longer match was possible). However, I don't see a way to do this with current perl. Is there? Should somebody code it?

      Currently if I want to do linefree regexp on socket, I have to

      undef $/; $slurp=<SOCKET>; $slurp =~ m/wha+t\n?\s*ever/;
      to do that. That kinda pisses me off, because then I'll have to emulate the pipe behaviour

      $s=(length $`)+(length $&); substr($slurp,$s) =~ m/post\s*match/;
      to do what I want, plus I have to download more than I actually needed. (I'm not quite sure about that)
        I don't know what you mean by "download more than I need" since whatever you read in on the socket descriptor is already in your computer's socket buffer. The penalty for reading in data as 40 Kb vs. 400 Kb. is negligible. If you know that the current packet will or will not contain these four characters, then the search will reveal it. When you do $data=<SOCKET>, that does not send a request to the host for data, it snatches the data that's already in the buffer. If you wanted simulated network pipes, then you would have to read in one byte at a time after sending a request for a byte, checking and checking for a match, obviously very inefficient (like TCP telnet). Mark_Dominus solution is OK but it certainly doesn't behave any differently than reading in the data buffer, stopping where the regex matches, and returning the rest. Your best bet is NOT trying to force a "fluid pipe" behavior on something that will come in chunks which should be self-descriptive (the fact that a chunk/packet is waiting often enough to determine what its purpose is.) Snatch what you need and discard the rest is the ONLY solution.
        AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.