Lavezzi has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys,

I have about 6 files full of traceroute data which I need to analyze for changes (e.g. Differences in number of hops, how often the route changes).

The inside of the files look like this at the moment:

13,4.69.137.70 14,4.69.134.70 15,4.69.134.113 16,4.69.135.185 17,4.69.134.246 18,4.68.18.75 19,4.59.0.10 20,124.211.34.129 21,203.181.100.61 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11 13,4.69.137.74 14,4.69.134.70 15,4.69.134.113 16,4.69.135.185 17,4.69.134.246 18,4.68.18.11 19,4.59.0.10 20,124.211.34.121 21,203.181.100.61 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11 13,4.69.137.70 14,4.69.134.78 15,4.69.134.125 16,4.69.135.185 17,4.69.134.250 18,4.68.18.139 19,4.59.0.10 20,124.211.34.121 21,203.181.100.189 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11 etc.

I need to write code in perl that will:

1. Read the first section of data (A)
2. Compare it to the next section of data (B)
3. Print outfile if the Route/IPs have changed
4. Store the data again (B)
5. Compare it to the next section of data (C)
6. Repeat until the end of the file.

I have already googled to try and find a solution myself, but a lot of the questions asked on google relate to perl comparisons between two files, whereas I need to compare the data within a file only.

I tried myself to write the code which would do this, but I failed miserably. I was thinking along the lines of two loops that would alternate with each other after each line break, but to be honest I'm not sure if that would be the best method to compare the data.

If anyone could help me come up with an idea or solution to comparing the data, I would be incredibly grateful.

Replies are listed 'Best First'.
Re: Best way to compare my data?
by Corion (Patriarch) on Mar 28, 2010 at 15:20 UTC

    Maybe you want to split up the file into several files, then compare the differences between the several files? Along the way, you might find a way how to avoid the "split into several files" part and compare the differences from the same file. I'm not sure where you are having problems, so maybe if you show some code we can help you further.

    One possible approach would be to use "paragraph mode" to read in the sections of your file, then split those sections into lines, then find the differences between the sections. Maybe you could use Algorithm::Diff to give you a human readable overview of what changed.

      The only reason I would prefer not to split it up into different files is because there would then be 196 files * the 6 different logs that I have.

      I have uploaded my code to my scratchpad, but obviously it doesn't work, that's why I'm here. I'm a complete beginner with Perl (and coding in general, I only started in the last week), so please don't laugh at my effort, haha. Also I hope the formatting is OK since I'm new to that too!

        Why don't you post your code here instead?

        Instead of manually splitting up your source file, you could write a program to split up your source file and then compare the split up parts. You could also skip the part where you write out your input file into separate files and compare the split up parts in memory instead of writing them out to files and reading them back in.

        So far, I get the impression that you have not really put much thought into possible approaches. Maybe you shouldn't attack the problem as a whole, but instead simplify the problem first:

        • If you have two files, how do you determine their differences?
        • If you have one file with two sets of routes, how can you read that file into two memory structures?
        • If you have one file with more than two sets of routes, how will you determine the differences?
        I have uploaded my code to my scratchpad, but obviously it doesn't work, that's why I'm here.

        You mean this code?:

        #!/usr/bin/perl -w use strict; my $infile = 'JPStream.csv'; my $outfile = 'new1.csv'; open IN, "< $infile" or die "Can't open $infile : $!"; open OUT, "> $outfile" or die "Can't open $outfile : $!"; my %seen; my %seen2; while (<IN>) { next if /^$/; chomp; if ( ! $seen2{$_} ) { print OUT "$_ Not in the last Traceroute ^\n"; } last if /^$/; $seen{$_}++; %seen2 = (); } while (<IN>) { next if /^$/; chomp; if ( ! $seen{$_} ) { print OUT "$_ Not in the last Traceroute^^\n"; } last if /^$/; $seen2{$_}++; %seen = (); } }

        Why not post your code in the same node as your question? It certainly isn't due to length.

        Please make it easier for us to help you help yourself.

        HTH,

        planetscape
Re: Best way to compare my data?
by Perlbotics (Archbishop) on Mar 28, 2010 at 17:35 UTC

    I understand, that you need a hop- and section-wise comparison, so the following might get you started. However, the naive - text based - approach below will fail when the order of hops/IP's changes - e.g. by means of topology changes or selection of an alternative route having more/less intermediate hops. IF that is also of concern to you, a real network model (nodes and edges (graphs)) would be better suited than a plain text comparison. HTH

    IP#1 IP#2/2b IP#3 IP#4 Section D +iff ============================================ +================== OK (alternate route): HOP1 --- HOP2 ----------- HOP3 (#1) HOP1 --- HOP2b ----------- HOP3 (#2) --> +@hop2: IP#2-->IP#2b EEK! (non-equidistant): HOP1 --- HOP2 ----------- HOP3 (#1) a) shorter route HOP1 --------------------- HOP2 (#2a) --> +@hop2: IP#2-->IP#4 (err?) b) longer route HOP1 --- HOP2 --- HOP3 --- HOP4 (#2b) --> +@hop3: IP#4-->IP#3 (err?)

    use strict; use warnings; sub prettyip { my $ip = shift; $ip =~ s/ (\d+) / sprintf("%03d",$1) /smgex; return $ip; } my %last_seen_ip_from_hop; # last IP seen for key=HOP my $section = 1; # section within the file my $previous_hop = 0; # previous HOP / new section event while (my $line = <DATA>) { if ($line =~ /^(\d+),(\S+)/) { # extract HOP and IP my ($hop, $ip) = ($1, $2); my $last_ip_seen = $last_seen_ip_from_hop{$hop}; my ($changemark_pre, $changemark_pos) = ("", ""); # detect a new section (A/B/C) $section++, print "\n" if $previous_hop > $hop; # new file/section $previous_hop = $hop; # notify if a change occured for a given hop since last seen if (defined $last_ip_seen and $ip ne $last_ip_seen) { $changemark_pre = 'changed to'; $changemark_pos = '(was: ' . prettyip($last_ip_seen) . ')'; } $last_seen_ip_from_hop{$hop} = $ip; # init or update current HOP/I +P printf "sect.%2d / hop %2d: %15s %15s %s\n", $section, $hop, $changemark_pre, prettyip($ip), $changemark_pos; } } __DATA__ 13,4.69.137.70 14,4.69.134.70 15,4.69.134.113 16,4.69.135.185 17,4.69.134.246 18,4.68.18.75 19,4.59.0.10 20,124.211.34.129 21,203.181.100.61 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11 13,4.69.137.74 14,4.69.134.70 15,4.69.134.113 16,4.69.135.185 17,4.69.134.246 18,4.68.18.11 19,4.59.0.10 20,124.211.34.121 21,203.181.100.61 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11 13,4.69.137.70 14,4.69.134.78 15,4.69.134.125 16,4.69.135.185 17,4.69.134.250 18,4.68.18.139 19,4.59.0.10 20,124.211.34.121 21,203.181.100.189 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11
    Output:
    sect. 1 / hop 13: 004.069.137.070 sect. 1 / hop 14: 004.069.134.070 sect. 1 / hop 15: 004.069.134.113 sect. 1 / hop 16: 004.069.135.185 sect. 1 / hop 17: 004.069.134.246 sect. 1 / hop 18: 004.068.018.075 sect. 1 / hop 19: 004.059.000.010 sect. 1 / hop 20: 124.211.034.129 sect. 1 / hop 21: 203.181.100.061 sect. 1 / hop 22: 118.155.197.140 sect. 1 / hop 23: 124.211.010.066 sect. 1 / hop 24: 163.139.130.138 sect. 1 / hop 25: 163.139.124.057 sect. 1 / hop 26: 202.215.179.001 sect. 1 / hop 27: 202.215.179.011 sect. 2 / hop 13: changed to 004.069.137.074 (was: 004.069.137.07 +0) sect. 2 / hop 14: 004.069.134.070 sect. 2 / hop 15: 004.069.134.113 sect. 2 / hop 16: 004.069.135.185 sect. 2 / hop 17: 004.069.134.246 sect. 2 / hop 18: changed to 004.068.018.011 (was: 004.068.018.07 +5) sect. 2 / hop 19: 004.059.000.010 sect. 2 / hop 20: changed to 124.211.034.121 (was: 124.211.034.12 +9) sect. 2 / hop 21: 203.181.100.061 sect. 2 / hop 22: 118.155.197.140 sect. 2 / hop 23: 124.211.010.066 sect. 2 / hop 24: 163.139.130.138 sect. 2 / hop 25: 163.139.124.057 sect. 2 / hop 26: 202.215.179.001 sect. 2 / hop 27: 202.215.179.011 sect. 3 / hop 13: changed to 004.069.137.070 (was: 004.069.137.07 +4) sect. 3 / hop 14: changed to 004.069.134.078 (was: 004.069.134.07 +0) sect. 3 / hop 15: changed to 004.069.134.125 (was: 004.069.134.11 +3) sect. 3 / hop 16: 004.069.135.185 sect. 3 / hop 17: changed to 004.069.134.250 (was: 004.069.134.24 +6) sect. 3 / hop 18: changed to 004.068.018.139 (was: 004.068.018.01 +1) sect. 3 / hop 19: 004.059.000.010 sect. 3 / hop 20: 124.211.034.121 sect. 3 / hop 21: changed to 203.181.100.189 (was: 203.181.100.06 +1) sect. 3 / hop 22: 118.155.197.140 sect. 3 / hop 23: 124.211.010.066 sect. 3 / hop 24: 163.139.130.138 sect. 3 / hop 25: 163.139.124.057 sect. 3 / hop 26: 202.215.179.001 sect. 3 / hop 27: 202.215.179.011
Re: Best way to compare my data?
by GrandFather (Saint) on Mar 28, 2010 at 19:48 UTC

    Parse the file into individual route blocks so that you end up with an array (routes) of arrays (nodes in a route). Then use Algorithm::Diff to compare pairs of routes to pull out the difference information you require.

    If you'd have shown us your code I'd have shown you mine.


    True laziness is hard work

      Sorry, I read in the FAQ that it was best to post code in the scratchpad so it was easier for other users to see!

      Most of these suggestions are going way over my head, like I said I'm not that well versed with Perl at all!

        Scratchpads are transitory. Code is best placed in its relevant thread for the benefit of those who come days, weeks, months after the scratchpad has been altered or cleared.

        I think if you carefully re-read the FAQ, it actually suggests posting code on your scratchpad if you have asked (or are about to ask) a longish question in the Chatterbox.

        HTH,

        planetscape
Re: Best way to compare my data?
by GrandFather (Saint) on Mar 29, 2010 at 20:42 UTC

    The following uses Algorithm::Diff to do the heavy lifting:

    use strict; use warnings; use Algorithm::Diff; my @routes; local $/ = "\n\n"; push @routes, $_ while $_ = <DATA>; chomp @routes; @routes = map {[split "\n"]} @routes; my @reference = @{shift @routes}; my $lenChanges = 0; my $hopChanges = 0; for my $route (@routes) { my @diffs = Algorithm::Diff::diff(\@reference, \@$route); next if !@diffs; @reference != @$route ? ++$lenChanges : ++$hopChanges; } print "Length changes: $lenChanges\n"; print "Hop changes: $hopChanges\n"; __DATA__ 13,4.69.137.70 14,4.69.134.70 15,4.69.134.113 16,4.69.135.185 17,4.69.134.246 18,4.68.18.75 19,4.59.0.10 20,124.211.34.129 21,203.181.100.61 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11 13,4.69.137.74 14,4.69.134.70 15,4.69.134.113 16,4.69.135.185 17,4.69.134.246 18,4.68.18.11 19,4.59.0.10 20,124.211.34.121 21,203.181.100.61 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11 13,4.69.137.70 14,4.69.134.78 15,4.69.134.125 16,4.69.135.185 17,4.69.134.250 18,4.68.18.139 19,4.59.0.10 20,124.211.34.121 21,203.181.100.189 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11 13,4.69.137.74 14,4.69.134.70 15,4.69.134.113 16,4.69.135.185 17,4.69.134.246 18,4.68.18.11 19,4.59.0.10 20,124.211.10.120 20,124.211.26.120 21,203.181.100.61 22,118.155.197.140 23,124.211.10.66 24,163.139.130.138 25,163.139.124.57 26,202.215.179.1 27,202.215.179.11

    Prints:

    Length changes: 1 Hop changes: 2

    Note too the use of the $/ record separator special variable to ease the parsing of the file into records.


    True laziness is hard work

      Just wanted to say that I very much appreciate all of the help that has been given to me in this thread.

      The reason I haven't replied is that I don't have the time to look at this project at the moment as I have more urgent work to look at in the mean time! So thanks again!