in reply to Weird IO::Socket problem

I think it would be useful for you to narrow the problem down. From what I understand, the client is connected - and remains connected - to the server, but the data stops flowing over the connection despite the fact that the file you're tailing continues to grow.

I suggest writing a quick test script to run File::Tail on the file in exactly the same way as you do in interact():

$filename=... $file=File::Tail->new(name=>$filename, maxinterval=>5, interval=+>1, t +ail=>-1, errmode=>\&do_exit); while ( defined($line=$file->read) ) { print localtime()." ".$line; }
I suggest you run this on the server host, in parallel with your existing client/server code. Redirect the output to some file. If the problem recurs, you check to see if this little test script also failed. If so, you know the problem is with tail. If not, it's most likely to be a network issue.

Alternatively of course, you could add some extra logging to your existing server code. You'll probably end up having to do that anyway.

In general I don't think you can collect enough debugging information in a situation like this. Make every part of your code print out something to a log to tell you what it's doing.

Replies are listed 'Best First'.
Re^2: Weird IO::Socket problem
by hallikpapa (Scribe) on Jan 14, 2008 at 20:12 UTC
    OK I caught the interface dropping packets. It's a Gig-E interface, and it does retransmit dropped packets. My question is what can I do about this on the script side? It seems as though if packets get dropped a lot, the script pauses and doesn't recover. I would like it to be able to capture re-transmitted packets and continue on where it left off. Is this possible with IO::Socket? Is it possible in POE? I am eventually going to rewrite the whole process using POE, but this may speed up the process.
      A small number of dropped packets shouldn't be a problem, but if you are having spikes where very many packets are being dropped - the majority, say - this will kill the performance of your TCP connection, possibly to the point where it's indistinguishable from "not working at all."

      If you want to debug the TCP side of it I suggest using tcpdump or maybe ethereal to see what happens during the connection. You will need to understand the TCP protocol in some detail (but this is WORTH knowing!) and you might have to collect gigs and gigs of capture log before you actually see the fault occur - but it should be revealing.

      On the other hand, if you're fairly certain that the problem you're seeing corresponds with major spikes of packet loss, there's not much point spending a lot of effort to confirm something that you already know - the network is broken and that's hurting your script.

      This really isn't a Perl issue any more; but you should not be seeing packet loss in a typical GigE LAN environment. Something is probably wrong. Maybe your host can't actually keep up with the network traffic (if you're using a 486 machine, I'd start with that!). Maybe the cabling isn't up to CAT6 spec; maybe the switch fabric is overloaded; maybe someone has an industrial compressor on the same circuit as your switch (this has happened to me). Maybe your routing is seriously screwed up and all your supposed LAN traffic is going over and back on a 64K satellite link to Helsinki (this has also happened to me!).

      But I'd certainly try to pin down the cause of that packet loss.

      You might also want to check your duplex settings. Normally GigE should be set to full duplex on both sides, and not to autonegotiate.

      One of the most common findings when a Ethernet connection is losing packets or is just slow, is that one side or the other was set to autonegotiate and wound up in half duplex. If both sides start to transmit at about the same time, the half duplex side will see a collision, transmit a jamming signal, and assume that the other side is going to back off and retransmit... but the full duplex side is not looking for collisions and will never retransmit.

Re^2: Weird IO::Socket problem
by hallikpapa (Scribe) on Jan 14, 2008 at 19:23 UTC
    I will give this a shot. I appreciate the suggestion.