Hi all,

I'm a Perl newbie and would have a question about data merging and finding reverse IP pairs. I have a tab separated list of network traffic containing source IP, destination IP and bytes transferred between the peers as follows:

Source IP Destination IP Bytes 10.0.0.24 93.188.134.219 32684 120.137.205.48 10.0.0.171 258 10.0.0.26 84.124.185.220 432 10.0.0.10 84.31.180.236 1476 84.31.180.236 10.0.0.10 4273

I would need to aggregate the data (bytes) for each session (= source/destination IP pair = destination/source IP pair). In the above example data the last two lines should be aggregated as follows:

10.0.0.10 84.31.180.236 1476 84.31.180.236 10.0.0.10 4273

=>

10.0.0.10 84.31.180.236 5749

The order of the IPs doesn't matter. Finally the complete list of all data should be printed. Based on the above example the source data should finally be shown as:

10.0.0.24 93.188.134.219 32684 120.137.205.48 10.0.0.171 258 10.0.0.26 84.124.185.220 432 10.0.0.10 84.31.180.236 5749

I've created the following solution:

#!/usr/bin/perl use strict; my @lines; open(D, $ARGV[0]) || die("Could not open file!\nUsage: $0 file "); @lines = <D>; close(D); my %count; foreach (@lines) { next if /^#|^(\s)*$/; chomp; my ($ipa, $ipb, $bytes) = split /\t\s?/; if((grep /$ipb/, %count) && (grep /$ipa/, (%{$count{$ipb}}))) { $count{$ipb}{$ipa}+=$bytes; } else { $count{$ipa}{$ipb}+=$bytes; } } foreach my $key(keys %count){ foreach my $k(keys %{$count{$key}}){ print "${key}\t$k\t$count{$key}->{$k}\n"; } }

That works well for a small amount of data (for few thousands of lines) but is basically unusable for vast amount of data (I have over 77M lines to process).

I have been struggling to find a proper solution for the issue for the last three days but haven't progressed much. I would highly appreciate any help on this one. Thanks in advance! :)

Br, -=Markus=-

Ps.

How the same (aggregation of all data columns) can be done for data containing multiple columns? Like:

Source IP Destination IP Bytes Packets Flows 10.0.0.10 84.31.180.236 1476 241 22 84.31.180.236 10.0.0.10 4273 15 3 => 10.0.0.10 84.31.180.236 5749 256 25

In reply to How to merge data in IP address pairs by -=Markus=-

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.