Re: Algorithom to find overlaping subnets (Internet IPv4)

FWIW: This processes a list of 50e3 randomly generated CIDRs in < 3 minutes:

#! perl -slw
use strict;
use Time::HiRes qw[ time ];
use Data::Dump qw[ pp ];

use enum qw[ CIDR NETWORK SIZE RANGE ];
use enum qw[ FIRST LAST ];

sub dd2n { unpack 'N', pack 'C4', split '\.', $_[0] }
sub n2dd { join '.', unpack 'C4', pack 'N', $_[0]   }


sub CIDR2Range {
    my( $network, $bits ) = $_[ 0 ] =~ m[([^/]+)/(\d+)]
        or die "Bad CIDR: '$_[0]'";
    my $start =  dd2n( $network );
    my $hostMask = ( 1 << ( 32 - $bits ) ) -1;
    die "Bad CIDR: '$_[0]'" if $start & $hostMask;
    my $end = $start | $hostMask;
    return [ $start, $end ];
}

sub rangesOverlap {
    my( $thisFirst, $thisLast ) = @{ $_[0] };
    my( $thatFirst, $thatLast ) = @{ $_[1] };
    return 1 unless
        $thisFirst > $thatLast or $thatFirst > $thisLast;
}

sub isContainedBy {
    my( $thisFirst, $thisLast ) = @{ $_[0] };
    my( $thatFirst, $thatLast ) = @{ $_[1] };
    return 1
        if  $thisFirst <= $thatFirst
        and $thisLast  >= $thatLast;
}

my $start = time;

chomp( my @CIDRs = <> );

my $count = @CIDRs;

@CIDRs = sort {
    $b->[SIZE] <=> $a->[SIZE] || $b->[RANGE][FIRST] <=> $a->[RANGE][FI
+RST]
} map[
    $_, m[([^/]+)/(\d+)$], CIDR2Range( $_ )
], @CIDRs;

#pp \@CIDRs;

my @forest;

OUTER: while( @CIDRs ) {
    my $next = pop @CIDRs;
    my $that = $next->[RANGE];
    for my $tree ( @forest ) {
        my $this = $tree->{root}[RANGE];
        if( isContainedBy( $this, $that ) ) {
            push @{ $tree->{contains} }, $next;
            next OUTER;
        }
    }
    push @forest, { root => $next };
}

@forest = sort{
    $a->{root}[RANGE][FIRST] <=> $b->{root}[RANGE][FIRST]
} @forest;

for my $tree ( @forest ) {
    print $tree->{root}[CIDR];
    print "\t", join ' ', map $_->[CIDR], @{ $tree->{contains} };
}

printf STDERR "Took %.3f seconds for $count CIDRs\n",
    time() - $start;
[download]

The (truncated) output produced is:

0.0.0.0/12
    0.10.0.0/15 0.1.0.0/16 0.7.0.0/16 0.12.0.0/16 0.11.0.0/17 0.4.128.
+0/19 0.11.160.0/19 0.12.72.0/21 0.2.132.0/22 0.3.184.0/22 0.15.254.0/
+24 0.14.150.128/26
0.16.128.0/17
    
0.17.0.0/16
    
0.18.163.128/26
    
0.19.237.0/24
    
0.24.16.0/20
    
0.24.104.0/21
    
0.25.0.0/19
    
0.25.144.0/22
    
0.27.0.0/16
    
0.28.190.0/23
    
0.28.203.128/25
    
0.29.0.0/16
    0.29.64.0/20
0.30.240.0/20
    
0.31.157.160/27
    
0.34.0.0/16
    
0.35.0.0/18
    
0.37.252.0/22
    
0.38.64.0/22
    
0.40.0.0/13
    0.40.0.0/15 0.40.0.0/16 0.47.0.0/16 0.42.0.0/18 0.44.128.0/18 0.47
+.192.0/18 0.43.0.0/19 0.47.64.0/21 0.47.192.0/21
0.48.0.0/13
    0.54.0.0/15 0.55.128.0/18 0.48.160.0/19 0.52.224.0/19 0.53.96.0/19
+ 0.55.32.0/20 0.49.124.0/22 0.51.241.128/26
0.56.168.0/23
    
0.58.0.0/21
...
[download]

I produced the datasets using this generator script:

#! perl -slw
use strict;
use Math::Random::MT qw[ rand srand ];

sub n2dd { join '.', unpack 'C4', pack 'N', $_[0]   }

our $N //= 50e3;
our $S //= 1;

srand( $S );

my %uniq;

for( 1 .. $N ) {
    my $bits = 8 + int( rand( 8 ) + rand( 8 ) + rand( 8 ) );
    my $ip   = int( rand( 2 **32 ) );
    $ip &= ~( ( 1 << ( 32 - $bits ) ) -1 );
    $ip = n2dd( $ip );
    my $cidr = "$ip/$bits";
    redo if exists $uniq{ $cidr };
    $uniq{ $cidr } = undef;
    print $cidr;
}
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Algorithom to find overlaping subnets (Internet IPv4) Select or Download Code

Replies are listed 'Best First'.
Re^2: Algorithom to find overlaping subnets (Internet IPv4) by jimpudar (Pilgrim) on Apr 11, 2018 at 05:33 UTC
I recently had to work on this problem, and needed a fast implementation. I found the easiest and fastest way was to use a Trie as mentioned by rg0now. I used the random generator which BrowserUk supplied to generate the random dataset. My version processed the dataset in less than two seconds on my 3.8GHz Linux box. I don't have nice sorted output though. It should be trivial to build up a data structure as you go if you need sorted output... Here's the code: #! /usr/bin/env perl use strict; use warnings; use autodie; use 5.10.0; use Tree::Trie; use Net::IP qw(ip_splitprefix ip_iptobin ip_get_mask); my $trie = Tree::Trie->new(); while (<>) { chomp; my ($ip, $len) = ip_splitprefix($_); my $binip = ip_iptobin($_,4); my $trie_entry = substr($binip, 0, $len); $trie->deepsearch('prefix'); my $data = $trie->lookup_data($trie_entry); if ($data) { say "'$_' is contained by or equals '$data'"; } else { $trie->deepsearch('choose'); my $data = $trie->lookup_data($trie_entry); if ($data) { say "'$_' contains '$data'"; } } $trie->add_data($trie_entry, $_); } [download] Some truncated sample output: time ./trie_overlap.pl <50k_ips ... '116.15.84.0/23' is contained by or equals '116.14.0.0/15' '108.245.172.0/24' is contained by or equals '108.244.0.0/14' '13.238.173.128/25' is contained by or equals '13.224.0.0/11' '37.179.224.0/19' is contained by or equals '37.179.192.0/18' '73.147.0.0/17' is contained by or equals '73.144.0.0/12' '150.152.0.0/13' contains '150.154.0.0/18' '171.93.16.0/22' is contained by or equals '171.0.0.0/9' '80.246.128.0/21' is contained by or equals '80.240.0.0/13' real 0m1.957s user 0m1.832s sys 0m0.124s [download] Anyone see any issues with this? It hasn't been completely battle tested yet :) Best, Jim	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Algorithom to find overlaping subnets (Internet IPv4)
by jimpudar (Pilgrim) on Apr 11, 2018 at 05:33 UTC

I recently had to work on this problem, and needed a fast implementation.

I found the easiest and fastest way was to use a Trie as mentioned by rg0now.

I used the random generator which BrowserUk supplied to generate the random dataset.

My version processed the dataset in less than two seconds on my 3.8GHz Linux box.

I don't have nice sorted output though. It should be trivial to build up a data structure as you go if you need sorted output...

Here's the code:

#! /usr/bin/env perl

use strict;
use warnings;
use autodie;
use 5.10.0;

use Tree::Trie;
use Net::IP qw(ip_splitprefix ip_iptobin ip_get_mask);

my $trie = Tree::Trie->new();

while (<>) {
    chomp;
    my ($ip, $len) = ip_splitprefix($_);

    my $binip = ip_iptobin($_,4);

    my $trie_entry = substr($binip, 0, $len);

    $trie->deepsearch('prefix');
    my $data = $trie->lookup_data($trie_entry);

    if ($data) {
        say "'$_' is contained by or equals '$data'";
    }
    else {
        $trie->deepsearch('choose');
        my $data = $trie->lookup_data($trie_entry);

        if ($data) {
            say "'$_' contains '$data'";
        }
    }
    $trie->add_data($trie_entry, $_);
}
[download]

Some truncated sample output:

time ./trie_overlap.pl <50k_ips

...

'116.15.84.0/23' is contained by or equals '116.14.0.0/15'
'108.245.172.0/24' is contained by or equals '108.244.0.0/14'
'13.238.173.128/25' is contained by or equals '13.224.0.0/11'
'37.179.224.0/19' is contained by or equals '37.179.192.0/18'
'73.147.0.0/17' is contained by or equals '73.144.0.0/12'
'150.152.0.0/13' contains '150.154.0.0/18'
'171.93.16.0/22' is contained by or equals '171.0.0.0/9'
'80.246.128.0/21' is contained by or equals '80.240.0.0/13'

real    0m1.957s
user    0m1.832s
sys     0m0.124s
[download]

Anyone see any issues with this? It hasn't been completely battle tested yet :)

Best,

Jim

[reply]
[d/l]
[select]