in reply to Re^3: Sniffer::HTTP problem with timeout
in thread Sniffer::HTTP problem with timeout

Well.....After looking at, and trying your replies, Thank you very much for your help, It became clear that there was a random inconsistency in how things were running. I finally decided to run tcpdump and add a 'control' to the equation. suddenly everything worked just as it was supposed to. I did notice that tcpdump reported a packet dropped by the kernel though. I ran tcpdump a few more times and sometimes saw dropped packets and sometime not. I looked at the system resources in top and the system was never less than 50% idle, tcpdump was never more than 1% of cpu resources and I saw irq/18-b43 which while never over 4% cpu resources registered -51 for priority. I thought priorities ran from -20 - +20. I have no idea how to verify if the kernel is dropping packets when I am using your Sniffer::HTTP live but I am reasonably sure that is what my problem is.

I guess what all this means is that is not anything in my code. So the question now is, How to stop the packet loss. I found some information regarding disk writes blocking tcpdumps ability to retrieve packets from the nic before the buffer overflows and a suggestion to increase buffer size with
echo 4194304 > /proc/sys/net/core/rmem_max; echo 4194304 > /proc/sys/n +et/core/rmem_default
It didn't seem to help, do you have any idea that might help with the kernel packet loss and or what I can do about it. I am running this on a couple different P4 machines one being a 3.2G the other a dual core 2.4G so I wouldn't think horsepower is the issue. Is there something that I should be doing differently on my system so that pcap works better?

Again, Thank you very much for helping me find the problem.

Replies are listed 'Best First'.
Re^5: Sniffer::HTTP problem with timeout
by Corion (Patriarch) on Mar 20, 2011 at 21:24 UTC

    I'm sorry, but I have only very little knowledge of Linux kernels.

    In theory, TCP should be able to cope with the dropped packets and HTTP::Sniffer should be able to reconstruct the TCP stream despite the dropped packets. Maybe it is that the per-packet callback code of Net::Pcap blocks your kernel so long that it starts dropping packets, but as far as I'm aware, Net::Pcap runs asynchronously to the kernel. I'm not sure how you can easily verify that - maybe having sleep in the callback helps reproduce the dropped packets.

      Well thank you again for helping me to locate the problem. I learned a few things in the process. I found something called Gulp that was written to address this problem by buffering but needs to be in the capture loop. I think that is beyond my ability so I will continue to look for an alternative solution.
        OK, I've got this working pretty well. I found Net::Pcap::stats which let me see received and dropped packets and found that the problem was indeed dropped packets. I then switched from
        $sniffer->run(); # uses the "best" default device
        to feeding Sniffer from my own extremely simple Net::Pcap loop.
        my $err = ''; my $dev = Net::Pcap::pcap_lookupdev(\$err); # find a device # open the device for live listening my $pcap = Net::Pcap::pcap_open_live( $dev, 4096, 0, 0, \$err); Net::Pcap::pcap_loop( $pcap, -1, \&process_pkt, "user data"); my %stats; $stats{ps_drop}=0; sub process_pkt { my ($user_data,$hdr,$pkt)=@_; Net::Pcap::stats( $pcap,\%stats ) ; print "$stats{ps_drop} pkts drpd, $stats{ps_recv} pkts rcvd.\n"; $sniffer->handle_eth_packet($pkt); }
        to try bypassing the grabber in Sniffer::HTTP. I found that I could go for hours without dropping a single packet. I then started looking at the code in sub run in Sniffer/HTTP.pm and the only difference I could see (read understand) was that I had set snaplen to 4096 in the creation of my capture device. I happen to know that what I am looking at is going to be smaller than that. I then changed only that in the Sniffer/HTTP.pm code and now I can use the use the run method and not get dropped packets.

        Now I know that you wrote this to cover all reasonable scenarios hence the big number. But what I don't understand is that since snaplen is only supposed to be an upper limit and if the incoming packet is only 1440 bytes the 128000 shouldn't even come into play, right? So why does dropping it to 4096 solve my dropped packet problem?

        I realize that this is an issue with Net::Pcap but I'm sure you know more about Net::Pcap than I do and I would like to try and understand how and why what I did seems to have fixed this issue.

        Thanks for all your help and putting up with me.