in reply to dumping hashes to pcap files

* Did you test how many open files you can have simultaneously? Maybe it is enough for you to just dump to the files directly, maybe it can be extended until it is enough. If you store everything in a hash you have to think about the memory limit. Do all the PCAP files fit into memory? If not, maybe you have to write to file inbetween whenever memory is nearly full.

* I don't get why you would use random keys for the hash. If you use random keys you might as well use a simple array where the array index is the not-that-random "key". I'm talking about %sip, that should be @sip instead (or removed completely, see below)

* What you do in your final step (when there is already a key $callid in main_has) is not working. Whenever you do "$main_hash{$callid} = {increase() => $value}", you are overwriting aka initializing the previous hash there with a new hash, not adding a value! The right form would be "$main_hash{$callid}{increase()}= $value"

* I don't see any reason why you do that inserting into $main_hash in two steps? Why not add to the main_hash in process_sip itself instead of using the extraneous %sip hash

Replies are listed 'Best First'.
Re^2: dumping hashes to pcap files
by bigmoose (Acolyte) on Dec 19, 2011 at 12:38 UTC
    * Did you test how many open files you can have simultaneously? Maybe it is enough for you to just dump to the files directly, maybe it can be extended until it is enough. If you store everything in a hash you have to think about the memory limit. Do all the PCAP files fit into memory? If not, maybe you have to write to file inbetween whenever memory is nearly full.

    - With regards to simultaneous opens.. not really! In theory, the script will only ever have one pcap open to split at one time. It will also not open another pcap until it has finished splitting the pcap up. With regards to memory and storing everything in a hash, this had occured. But having tested with pcaps as big as 400MBs it's not proved to much of a concern with a 4GB ram server! Good idea to prepare something, just in case though. :)

    * I don't get why you would use random keys for the hash. If you use random keys you might as well use a simple array where the array index is the not-that-random "key". I'm talking about %sip, that should be @sip instead (or removed completely, see below)

    this is a good point. I decided to use a hash, because in the event of having several voip calls in one pcap i felt it would be easier to manage as several keys within 1 hash as opposed to potentially 100s of seperate arrays! furthermore, in the event I manage to work out how to dump packets from hashes it will be easier to 'dump where key = 'callid' then it will to be 'dump where array is like'.

    What you do in your final step (when there is already a key $callid in main_has) is not working. Whenever you do "$main_hash{$callid} = {increase() => $value}", you are overwriting aka initializing the previous hash there with a new hash, not adding a value! The right form would be "$main_hash{$callid}{increase()}= $value"

    doh. this is correct. i copied the wrong code in. rest assured the rest of it is correct! thanks for taking the time to point that out, though :)

    * I don't see any reason why you do that inserting into $main_hash in two steps? Why not add to the main_hash in process_sip itself instead of using the extraneous %sip hash

    I should have explained further! in a voip call the constituents that will make a call are the RTP SIP and (sip)/SDP data. If I were to dump only to a %sip %sdp and %rtp hash, then when I eventually dump, it will be trickier to go to each hash, retrieve relevant keys and then dump than it would be to go to one hash and ask for one keys worth of data.

    thanks for looking at my code. you've raised some good points concerning hashes and memory.. didn't really think of that side! would be easier if I could just append to pcaps than go through all this :)

      With regards to simultaneous opens.. not really! In theory, the script will only ever have one pcap open to split at one time.

      I'm not talking about the reading part. I'm talking about writing. Your only reason to read all that stuff into memory seems to be your dislike of closing and reopening of output files a lot. Why not keep them open all the time until you are finished. Then it would be just writing each packet to the right file in the process_sip callback. To do that just use a hash with index callid and data filehandle. (Note: To use the filehandle with a print statement you first have to copy it to a scalar variable). Here a short example:

      # writing $line to the appropriate file for $callid. # Filename is assumed to be "log.$callid" and is opened if not open if (not exists $filehandle{$callid}) { open(my $fh,'>','log.'.$callid) or die $!; $filehandle{$callid}= $fh; } my $fh= $filehandle{$callid}; print $fh $line;
      I decided to use a hash, because in the event of having several voip calls in one pcap i felt it would be easier to manage as several keys within 1 hash as opposed to potentially 100s of seperate arrays! furthermore, in the event I manage to work out how to dump packets from hashes it will be easier to 'dump where key = 'callid' then it will to be 'dump where array is like'.

      1 hash instead of 100s of arrays? @sip would be just one array, just as %sip is just one hash. The only difference would be that instead of "$sip{increase()} = $packet;" you would write "push @sip, $packet;". And instead of "foreach $value (values %sip) {" you would write "foreach $value (@sip) {".

      %main_hash is different, there the use of a hash is ideal. But again, the sub-hashes of %main_hash should be arrays instead of hashes (i.e. instead of using a HashOfHashes you might better use a HashOfArrays). General rule: Whenever you are tempted to use meaningless numbers (like random numbers) for keys in a hash, use an array instead.

      I should have explained further! in a voip call the constituents that will make a call are the RTP SIP and (sip)/SDP data. If I were to dump only to a %sip %sdp and %rtp hash, then when I eventually dump, it will be trickier to go to each hash, retrieve relevant keys and then dump than it would be to go to one hash and ask for one keys worth of data.

      I was not suggesting you add a %sdp and %rtp hash. I just looked what your script is doing and if I'm not missing something then what you do in two steps could as easily be done in one step. Even better, instead of getting all lines of the PCAP file in random order (this happens when you loop over a hash like %sip) the lines keep their order. In other words, why not do this:

      sub process_sip { my ($user_data, $header, $packet) = @_; my $asccidata = substr($packet,42); my $sip_pkt = Net::SIP::Packet->new_from_string($asccidata); my $callid=$sip_pkt->get_header('call-id'); if (exists $main_hash{$callid}) { ...

        jethro

        while we're not talking about my ultimate problem (of dumping to several pcaps at the same time), i've found your advice invaluable. Quite new to perl and programming so really appreciate your patience and time.

        all you've said now makes sense! +rep

Re^2: dumping hashes to pcap files
by bigmoose (Acolyte) on Dec 19, 2011 at 13:15 UTC

    in addition to anyone who may be looking at this. It appears that the net::pcap::dump function of the net::pcap module utilises pcap_dump_open() from the pcap library.

    this is as oppose to using pcap_dump_fopen which appears to append to an existing pcap instead of opening one.. see http://www.manpagez.com/man/3/pcap_dump_fopen/. Unfortunatly net::pcap doesn't allow you to call this.. as far as I can tell!

    oh well. perhaps this is as good a time as ever to look at XS :)