This is a fast storage and lookup method to match IP-addresses against IP-prefixes (or networks).

In a recent SoPW, Fangorn asked about caching of Net::Patricia objects, to make that cache available to multiple processes.

I had a brief look at Net::Patricia and what all that was about and thought "hmm, tree lookup? don't we have that close to the perl core?"

Turns out that DB_File, using a database of type DB_BTREE, fits that purpose very well. Memory requirements are minimal, and lookup time is no more than 5 micromiliseconds worst case (tested with 1.000.000 randomly generated prefixes).

DB_BTREE provides partial match on keys of an ordered binary tree. To use that feature effectively, network addresses are converted into bit strings, zipped (as in fly zip) with their prefix netmask bit string. The resulting string is used as key to the DB_BTREE table.

Example: having the CIDR notation 10.223.2.0/23, we get

net: 0 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 key: 1010101011101110111110111111111110101010101011000000000000000000

We get thus a 64 bit key (well, 64 char string :-). Upon search, the searched IP adress is zipped with the netmask 0xffffffff in the same manner, and starting with the leftmost 2 bits, a partial match against the tree is done. DB_BTREE returns the next key which is equal or greater than the key at hand. The resulting key is splitted back into network and netmask, and it is checked whether the ip matches the resulting range. If so, the value is pushed to a list. While the search key matches the returned key, further bits (2 at a time) are added. The search is repeated until there's a mismatch or no new key is returned, which means we're done.

The last element in the resulting list is the network prefix matching the given IP, with the most bits set in its mask.

#!/usr/bin/perl use DB_File; use Fcntl; use Getopt::Std; # 120000 2.217429, avg. 16 lookups, 360.434319750957/sec my $maxprefix = 1000000; # ip prefix limit my %o; getopts('f:ast',\%o); unless (@ARGV || grep { $_ } @o{qw(t s)}) { die <<EOH; usage: $0 -s [-f file] | [-a ] [ -t | ipaddress ] -s : setup -f : use values from file for setup ("xxx.xxx.xxx.xxx/nn nnnnn") -t : average timing -a : show all matching prefixes EOH } # unless @ARGV || grep {defined $_ } values %o; my $filename = 'astable.db'; my $x; # DB_File handle if ($o{s}) { if(! -e $filename) { $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_B +TREE or die "cannot open $filename: $!\n"; } else { die "file '$filename' exists, won't overwrite!\n"; } } else { $x = tie %h, "DB_File", $filename, O_RDONLY, 0666, $DB_BTREE or die "cannot open $filename: $!\n"; } # set up AS table. To perform a binary search, # we build a bit string as a sequence of alternating bits # of mask and network address, i.e. having 10.223.2.0/23, we get # # net: 0 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 + 0 # mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 +0 # key: 10101010111011101111101111111111101010101010110000000000000000 +00 # # for ip searches, we set the "mask part bits" of an ip all to 1, and +do # a incremental partial match against the keys in the table. if ($o{s}) { # setup # generate some random IP addresses and masks # set all non-mask bits to 0 and store the generated # IP address (minus 1, if last byte is non-zero) # in the files 'ips' for further use in lookup timing unless (-f 'ips') { open IP,'>', 'ips'; } else { die "file 'ips' exist, won't overwrite!\n"; } my $have_file; if ($o{f}) { $have_file = open I,'<', $o{f} or die "Can't read '$o{f}': $!\ +n"; } my $c; while ($. <= $maxprefix) { my (@bytes,$mask,$value); if($have_file) { $_ = <I> or last; chomp; (@bytes[0..3],$mask,$value) = m!(\d+).(\d+).(\d+).(\d+)/(\ +d+)\s+(.*)!; } else { @bytes = map { int rand 255 } 0..3; $mask = int rand (30) + 1; # no null mask :) } my @nbits = map {split//,unpack "B*",pack "C",$_ } @bytes; my @mbits = ((1) x $mask, (0) x (32 - $mask)); my $key = ''; my @b = (); for (0..31) { my $mbit = shift @mbits; my $n = shift @nbits; my $nbit = $mbit ? $n : 0; $key .= $mbit . $nbit; push @b, $nbit; } redo if $h{$key}; my $prefix = join('.',unpack "C4",pack"B32",join'',@b).'/'.$ma +sk; $bytes[3] += 1 if $bytes[3] == 0; # XXX should check network # boundaries here... my $ip = join('.',@bytes); print IP $ip,"\n"; $h{$key} = $value || $prefix; $.++ unless $have_file; } untie %h; exit 0; } #setup my $c; # lookup counter if($o{t}) { # average open I, '<', 'ips'; eval { require Time::HiRes }; $@ and die "Sorry, no average here - Time::HiRes missing"; Time::HiRes->import(qw(gettimeofday tv_interval)); my $t0 = [gettimeofday()]; my $t1 = [gettimeofday()]; my $l; while(<I>) { chomp; my $net = match($_); $l += $c; unless($. % 1000) { my $e = tv_interval ( $t1, [gettimeofday()]); my $g = tv_interval ( $t0, [gettimeofday()]); print "$. $e, avg. ",int($l/$.); print ' lookups, ',$./$g,"/sec\n"; $t1 = [gettimeofday()]; } } my $elapsed = tv_interval ( $t0, [gettimeofday()]); print "elapsed: $elapsed\n"; print "avg. lookup time: ",$elapsed / $.,"\n"; } else { /^\d+\.\d+\.\d+\.\d+$/ and print "$_ => ",match($_),"\n" for @ARGV +; } sub match { my $ip = shift; my $packedip = pack "C4",split/\./,$ip; my $numericip = unpack "N", $packedip; my @bits = split //,unpack "B*",$packedip; # reset cursor. $x->seq(my $foo, my $bar, R_FIRST); # now search until mismatch. my ($key, $ok, $v, @net); $c = 1; # lookup counter for(@bits) { my $lk = $key .= 1 . $_; #print "$key\n"; next if $ok and $ok =~ /^$key/; # shortcut $x->seq($lk,$v,R_CURSOR); # check if this key is a candidate push @net, $v if $v and in_range($numericip, $lk); unless ($lk =~ /^$key/ && $v && length ($lk) == 64) { if ($o{a}) { print " $_\n" for @net; } print "$c lookups - " if wantarray; return pop @net; # return net with longest mask } $ok = $lk; $c++; die "more than 32 lookups" if $c > 32; } # only one or no prefix found print "$c lookups - " if wantarray; pop @net; } sub in_range { my ($ip, $bits) = @_; my $net; $bits =~ s/(.)(.)/$net .= $2; $1/ge; my ($addr, $mask) = map { pack "B32",$_ } $net, $bits; my ($n,$m,$b) = map { unpack"N",$_} $addr,$mask,$addr|~$mask; $ip >= $n && $ip <= $b; }

The big advantages of this method over Net::Patricia:

Disadvantage:

update: tweaked so you can feed it a real AS-table (e.g. ...ip_to_as/bgp.txt) to setup the database. Running this script against the aformentioned file, I get an average of 0.0021 secs / lookup.

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

In reply to IP prefixes: match IP against "best" network by shmem

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.