in reply to Re^4: Sniffer::HTTP problem with timeout
in thread Sniffer::HTTP problem with timeout

I'm sorry, but I have only very little knowledge of Linux kernels.

In theory, TCP should be able to cope with the dropped packets and HTTP::Sniffer should be able to reconstruct the TCP stream despite the dropped packets. Maybe it is that the per-packet callback code of Net::Pcap blocks your kernel so long that it starts dropping packets, but as far as I'm aware, Net::Pcap runs asynchronously to the kernel. I'm not sure how you can easily verify that - maybe having sleep in the callback helps reproduce the dropped packets.

  • Comment on Re^5: Sniffer::HTTP problem with timeout

Replies are listed 'Best First'.
Re^6: Sniffer::HTTP problem with timeout
by ponley (Novice) on Mar 20, 2011 at 22:35 UTC
    Well thank you again for helping me to locate the problem. I learned a few things in the process. I found something called Gulp that was written to address this problem by buffering but needs to be in the capture loop. I think that is beyond my ability so I will continue to look for an alternative solution.
      OK, I've got this working pretty well. I found Net::Pcap::stats which let me see received and dropped packets and found that the problem was indeed dropped packets. I then switched from
      $sniffer->run(); # uses the "best" default device
      to feeding Sniffer from my own extremely simple Net::Pcap loop.
      my $err = ''; my $dev = Net::Pcap::pcap_lookupdev(\$err); # find a device # open the device for live listening my $pcap = Net::Pcap::pcap_open_live( $dev, 4096, 0, 0, \$err); Net::Pcap::pcap_loop( $pcap, -1, \&process_pkt, "user data"); my %stats; $stats{ps_drop}=0; sub process_pkt { my ($user_data,$hdr,$pkt)=@_; Net::Pcap::stats( $pcap,\%stats ) ; print "$stats{ps_drop} pkts drpd, $stats{ps_recv} pkts rcvd.\n"; $sniffer->handle_eth_packet($pkt); }
      to try bypassing the grabber in Sniffer::HTTP. I found that I could go for hours without dropping a single packet. I then started looking at the code in sub run in Sniffer/HTTP.pm and the only difference I could see (read understand) was that I had set snaplen to 4096 in the creation of my capture device. I happen to know that what I am looking at is going to be smaller than that. I then changed only that in the Sniffer/HTTP.pm code and now I can use the use the run method and not get dropped packets.

      Now I know that you wrote this to cover all reasonable scenarios hence the big number. But what I don't understand is that since snaplen is only supposed to be an upper limit and if the incoming packet is only 1440 bytes the 128000 shouldn't even come into play, right? So why does dropping it to 4096 solve my dropped packet problem?

      I realize that this is an issue with Net::Pcap but I'm sure you know more about Net::Pcap than I do and I would like to try and understand how and why what I did seems to have fixed this issue.

      Thanks for all your help and putting up with me.

        No, thank you for reporting back how you solved your problem. This means that I'll update Sniffer::HTTP to make the snapshot length configurable from the outside.

        My vague guess, not based on any evidence is that maybe libpcap allocates a fresh buffer or keeps a (too small) pool of snaplen byte buffers for each call to its capture loop. If we call the pcap loop quick enough, every buffer will only ever hold one packet, but maybe there are not enough buffers in the pool to store the packets that come in in the meantime. Alternatively, maybe I misunderstand libpcap, and each buffer is only intended for one packet, and it doesn't keep a pool of buffers around. Then the buffer of 128k is vastly oversized anyway.

        But that's all speculation - my approach will be to drop the default buffer size to 16k and to make the buffer size easily configurable from the outside when constructing the capture or calling ->run.

        Update: Released as 0.20 onto CPAN. Upon rereading Net::Pcap, it seems that the snaplen is per-packet, so the generous size of 128000 should have led to errors much earlier.