bigmoose has asked for the wisdom of the Perl Monks concerning the following question:

hello monks

big question this is. so thanks to anyone who reads it all!

im trying to write a script that opens up a pre recorded PCAP file, and depending on if certain packets meet certain requirements - dump them to seperate pcap files.

in particular I am opening a voip pcap file. And then aim is to dump all SIP, SDP and RTP packets that are from the same call into one file

the problem comes when there are several voip calls in one pcap. this is because when you loop round the main voip pcap, you will effectively have to dump to several other pcaps. this means closing and re-opening dumps.

further to this, while the pcap library does have a 'append to pcap' function, this hasn't been transferred to net::pcap. Apparently with good reason

my work around was to perhaps store all packets into hashes where there was master hash, with seperate hashes inside labelled as 'sip callids'. The kets to the 'sip callids' hashes are just random numbers and the values are the packet and header content.

once the entire voip pcap has been looped through, then dump from the hashes into seperate pcaps

i should have researched further.. as I'm struggling to find a way to get the hashes into pcaps. Below is the code.. does anyone care to input? Anyone had the similar limitation with the perl pcap library?

tldr; i'm assigning packets to a hash. can I then dump the packets assigned to a hash to a pcap file?

my %sip; # create sip hash to store all sip packets in to $run = 1; #search a directory for pcap files and list all matchign to dump list +array. for each of those pcaps (unless it one with current time in na +me), open it for each packet initiate 'process_packet' sub routine while ($run == "1") { system('cls'); my $shortTime = currentTime_short(); my @dumpList = glob("C:/*.pcap"); if (@dumpList) { foreach $dump(@dumpList) { if ($dump =~ m/$shortTime/) { system('cls'); print "DO NOT TOUCH THIS FILE IT IS CURRENTLY BEING USED\n +\n"; sleep(10); } else { $pcap = pcap_open_offline($dump, \$err) or die "Can't read '$dump': $err\n"; pcap_loop($pcap, -1, \&process_packet,''); pcap_close($pcap); # call mix_packets sub routine here here mix_packets(); print "done!\n"; sleep(10); }; }; } else { print "No pcaps found. trying again in 10 seconds.\n"; sleep(2); }; }; #if the packet has rtp in it.. initiate rtp sub.. if sip then sip rout +e etc. sub process_packet { my ($user_data, $header, $packet) = @_; if( $packet !~ m/sip/ && $packet !~ m/sdp/ ) { process_rtp(@_); } elsif ( $packet =~ m/sip/) { if ( $packet =~ m/sdp/) { process_sdp(@_); } else { process_sip(@_); }; }; } sub process_sip { my ($user_data, $header, $packet) = @_; # assign every sip PACKET to the sip hash.. increase is a sub routine +which creates a new number every time its called to act as the key. + $sip{increase()} = $packet; }; sub mix_packets { my $callid; my %main_hash; # having previous assigned all sip packets from the current pcap into +several values in the global %sip hash. we will now go through each, +and create a 'main_hash' with keys acting as hashes of hashes, named +as 'callids' with all packets with the same 'callid' going into that +key. foreach $value (values %sip) { my $asccidata = substr($value,42); my $sip_pkt = Net::SIP::Packet->new_from_string($asccidata); my $callid=$sip_pkt->get_header('call-id'); if (exists $main_hash{$callid}) { print "there is a key in the main hash with this packets c +allid in it\n"; $main_hash{$callid} = { increase() => $value, }; } else { print "there is no key in the main hash with this packets +callid in it\n"; $main_hash{$callid} = { increase() => $value, } }; } };

Replies are listed 'Best First'.
Re: dumping hashes to pcap files
by jethro (Monsignor) on Dec 19, 2011 at 12:18 UTC

    * Did you test how many open files you can have simultaneously? Maybe it is enough for you to just dump to the files directly, maybe it can be extended until it is enough. If you store everything in a hash you have to think about the memory limit. Do all the PCAP files fit into memory? If not, maybe you have to write to file inbetween whenever memory is nearly full.

    * I don't get why you would use random keys for the hash. If you use random keys you might as well use a simple array where the array index is the not-that-random "key". I'm talking about %sip, that should be @sip instead (or removed completely, see below)

    * What you do in your final step (when there is already a key $callid in main_has) is not working. Whenever you do "$main_hash{$callid} = {increase() => $value}", you are overwriting aka initializing the previous hash there with a new hash, not adding a value! The right form would be "$main_hash{$callid}{increase()}= $value"

    * I don't see any reason why you do that inserting into $main_hash in two steps? Why not add to the main_hash in process_sip itself instead of using the extraneous %sip hash

      * Did you test how many open files you can have simultaneously? Maybe it is enough for you to just dump to the files directly, maybe it can be extended until it is enough. If you store everything in a hash you have to think about the memory limit. Do all the PCAP files fit into memory? If not, maybe you have to write to file inbetween whenever memory is nearly full.

      - With regards to simultaneous opens.. not really! In theory, the script will only ever have one pcap open to split at one time. It will also not open another pcap until it has finished splitting the pcap up. With regards to memory and storing everything in a hash, this had occured. But having tested with pcaps as big as 400MBs it's not proved to much of a concern with a 4GB ram server! Good idea to prepare something, just in case though. :)

      * I don't get why you would use random keys for the hash. If you use random keys you might as well use a simple array where the array index is the not-that-random "key". I'm talking about %sip, that should be @sip instead (or removed completely, see below)

      this is a good point. I decided to use a hash, because in the event of having several voip calls in one pcap i felt it would be easier to manage as several keys within 1 hash as opposed to potentially 100s of seperate arrays! furthermore, in the event I manage to work out how to dump packets from hashes it will be easier to 'dump where key = 'callid' then it will to be 'dump where array is like'.

      What you do in your final step (when there is already a key $callid in main_has) is not working. Whenever you do "$main_hash{$callid} = {increase() => $value}", you are overwriting aka initializing the previous hash there with a new hash, not adding a value! The right form would be "$main_hash{$callid}{increase()}= $value"

      doh. this is correct. i copied the wrong code in. rest assured the rest of it is correct! thanks for taking the time to point that out, though :)

      * I don't see any reason why you do that inserting into $main_hash in two steps? Why not add to the main_hash in process_sip itself instead of using the extraneous %sip hash

      I should have explained further! in a voip call the constituents that will make a call are the RTP SIP and (sip)/SDP data. If I were to dump only to a %sip %sdp and %rtp hash, then when I eventually dump, it will be trickier to go to each hash, retrieve relevant keys and then dump than it would be to go to one hash and ask for one keys worth of data.

      thanks for looking at my code. you've raised some good points concerning hashes and memory.. didn't really think of that side! would be easier if I could just append to pcaps than go through all this :)

        With regards to simultaneous opens.. not really! In theory, the script will only ever have one pcap open to split at one time.

        I'm not talking about the reading part. I'm talking about writing. Your only reason to read all that stuff into memory seems to be your dislike of closing and reopening of output files a lot. Why not keep them open all the time until you are finished. Then it would be just writing each packet to the right file in the process_sip callback. To do that just use a hash with index callid and data filehandle. (Note: To use the filehandle with a print statement you first have to copy it to a scalar variable). Here a short example:

        # writing $line to the appropriate file for $callid. # Filename is assumed to be "log.$callid" and is opened if not open if (not exists $filehandle{$callid}) { open(my $fh,'>','log.'.$callid) or die $!; $filehandle{$callid}= $fh; } my $fh= $filehandle{$callid}; print $fh $line;
        I decided to use a hash, because in the event of having several voip calls in one pcap i felt it would be easier to manage as several keys within 1 hash as opposed to potentially 100s of seperate arrays! furthermore, in the event I manage to work out how to dump packets from hashes it will be easier to 'dump where key = 'callid' then it will to be 'dump where array is like'.

        1 hash instead of 100s of arrays? @sip would be just one array, just as %sip is just one hash. The only difference would be that instead of "$sip{increase()} = $packet;" you would write "push @sip, $packet;". And instead of "foreach $value (values %sip) {" you would write "foreach $value (@sip) {".

        %main_hash is different, there the use of a hash is ideal. But again, the sub-hashes of %main_hash should be arrays instead of hashes (i.e. instead of using a HashOfHashes you might better use a HashOfArrays). General rule: Whenever you are tempted to use meaningless numbers (like random numbers) for keys in a hash, use an array instead.

        I should have explained further! in a voip call the constituents that will make a call are the RTP SIP and (sip)/SDP data. If I were to dump only to a %sip %sdp and %rtp hash, then when I eventually dump, it will be trickier to go to each hash, retrieve relevant keys and then dump than it would be to go to one hash and ask for one keys worth of data.

        I was not suggesting you add a %sdp and %rtp hash. I just looked what your script is doing and if I'm not missing something then what you do in two steps could as easily be done in one step. Even better, instead of getting all lines of the PCAP file in random order (this happens when you loop over a hash like %sip) the lines keep their order. In other words, why not do this:

        sub process_sip { my ($user_data, $header, $packet) = @_; my $asccidata = substr($packet,42); my $sip_pkt = Net::SIP::Packet->new_from_string($asccidata); my $callid=$sip_pkt->get_header('call-id'); if (exists $main_hash{$callid}) { ...

      in addition to anyone who may be looking at this. It appears that the net::pcap::dump function of the net::pcap module utilises pcap_dump_open() from the pcap library.

      this is as oppose to using pcap_dump_fopen which appears to append to an existing pcap instead of opening one.. see http://www.manpagez.com/man/3/pcap_dump_fopen/. Unfortunatly net::pcap doesn't allow you to call this.. as far as I can tell!

      oh well. perhaps this is as good a time as ever to look at XS :)

Re: dumping hashes to pcap files
by Marshall (Canon) on Dec 19, 2011 at 13:21 UTC
    in particular I am opening a voip pcap file. And then aim is to dump all SIP, SDP and RTP packets that are from the same call into one file

    This is a question rather than an answer... Out of curiosity, what are you trying to do that some freeware app like WireShark cannot do? From what I understand, you can open a pcap file and WireShark can actually play the audio of one of the calls. Just curious. Maybe controlling what Wireshark can already do might be a ticket?

      No problem!

      I want to have the option of being able to choose databases to store data in, or even perhaps prepare for a situation where I have to scale out to multiple servers. all distant goals, but at the end of the day using the pcap library (which is also what wireshark uses itself) directly gives you a bit more flexibility!

      it's worth noting, that the library was built to be used by C. I'm using it via perl, which is where some limitation are becoming apparent, i think..

        Oh, I see. Thanks for the explanation.

        I guess I'm going to be quite naive here, but it sounds like you have what I would call a "traffic cop" application. You open a pcap file and read a packet, then decide where it should go, direct that traffic there. Get next packet, etc.

        I'm not quite understanding why there is a need to store any significant amount of data at all - I mean why it's not possible to just decide on-the-fly where the packet should go rather than having to save them for processing later?

        Sounds like these SIP packets determine when a call starts and when a call ends and that you can assign some kind of callid to that unique call. Further that the "inbetween packets" can also be easily id'ed as belonging to a particular call.

        I don't know how many calls are in one pcap file. But it could be that you can just have filehandles open to all of them - Depends upon OS filehandle limits. Open a new file when you see a new call starting.

        You could use a hash to map call-ids to file handles. Something like this:

        #!/usr/bin/perl -w use strict; my %filehandles; foreach ('call1','call2') { open my $fh, '>>', $_ or die "can't open $_ for append $!"; $filehandles{$_}=$fh; } # use call_id in the print to select the right filehandle to # write to my $call_id = "call1"; print {$filehandles{$call_id}} "to file1\n"; $call_id = "call2"; print {$filehandles{$call_id}} "to file2\n";
        Just trying to be helpful.