Fangorn has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

My situation is as follows:

I have a list of (all) ip prefixes with assigned AS numbers. I aggregated them, so there are ~120000 of them (prefixes, not as numbers :) ). I need to look them up real quick (meaning - I tell it an individual IP address and want to get a number of AS it is in), so I use Net::Patricia and it does the job well, but...

I have multiple processes using the data, and multiple machines running the same processes. As the data takes ~20MB in memory, it's a little inconvenient to multiply it more than 100 times.

So I thought: MEMCACHE. Or something like this. But it seems that memcache is actually suitable for simple key-value assignments, and I have no idea how I could use it to store my Net::Patricia object.

So - my questions are:

  1. Is there a way to store what I already have in memcache or any other type of cache, and share it among processes (and, preferably, machines)?
  2. If not - what would be a good way to cache ip prefixes with as numbers? Where good means: speed efficient (less than one second lookup times!) and easily cachable, preferably in memcache, but a 'host only' cache would be acceptable. Size is of no importance (since it's going to be shared, so additional couple of megs makes no real difference) as long as it's at least "imaginable" - a hash of hashes with all IPs in the world isn't ;).

Let's assume an sql-database is not an option here. Nor is rbldnsd and dns txt queries.

Thanks in advance,
Daniel

Replies are listed 'Best First'.
Re: (mem)caching of ip prefixes?
by whereiskurt (Friar) on Jun 27, 2007 at 17:48 UTC
    Good day!

    I just wanted to throw this out there: HOWTO Memoize a Function

    I'm not sure if this will do what you want, but it's helped me in the past.

    From the docs it says:`Memoizing' a function makes it faster by trading space for time. It does this by caching the return values of the function in a table. If you call the function again with the same arguments, memoize jumps in and gives you the value out of the table, instead of letting the function compute the value all over again.

    My understanding is you'd basically register the function calls that do the lookup as 'memoized' and anytime you call those functions, with the same params, it will use a lookup table. Maybe it'll help.

    Kurt

      Well, I don't really think it would help.

      First - I don't understand how it would enable sharing the data among processes/machines.

      Second - there are 4228250625 possible questions I might ask (IP addresses) and >65000 answers I might get (AS numbers) - so we're back to 'hash of all IP addresses in the world' thingie, and as an ADDITION to data I already have. And not sharable. Not a good one, no.

      But thanks anyway :)

Re: (mem)caching of ip prefixes?
by ikegami (Patriarch) on Jun 27, 2007 at 19:56 UTC

    Let's assume an sql-database is not an option here.

    Well, you could re-invent the wheel.

Re: (mem)caching of ip prefixes?
by perrin (Chancellor) on Jun 27, 2007 at 19:52 UTC
    The standard perl module for use with memcached sends all your data through Storable, so as long as Net::Patricia objects can be serialized with Storable, you don't need to do anything special to store them in memcached.
Re: (mem)caching of ip prefixes?
by BrowserUk (Patriarch) on Jun 27, 2007 at 23:19 UTC

    How long does it take to do the lookup on average?

    Seems like a standard deamon process that listens on a port might fit the bill. If the clients make permanent connections, the turn around time for sending a query packet and retriving a reply packet should be fairly swift. Certainly for connections from other local processes. If the network latency is to high for a single network deamon, put one copy per box.

    As for caching. A hash containing the 65000 possible answers would only take about 8mb.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Ok, I've been thinking about a daemon, but I thought maybe it is possible to cache the thing somehow and do without establishing another daemon. I considered daemon approach a little desperate, but I'm beginning to think maybe it wouldn't be that bad.

      I understand that hash with all possible answers wouldn't be that big, but it's questions that are making the whole thing impossible, and storing IPs in some memory efficient OR cachable way that provides lightning fast lookup speed is what I seek. I want to supply a single IP and get an answer containing it's AS number. I just can't construct a hash of all ips, it's ~4GB. So I need prefixes, thus I need a method for a really quick lookup of an individual IP in prefix list.

      My data right now is 120000 prefixes. My first (simple) approach was to construct a hash like this:

      %iphash = ( 192 => { 192.168.0.0/24 => 23244, 192.168.1.0/24 => 4324, }, 193 => { 193.1.0.0/16 => 2452 }, );
      and so on, these are of course just some stupid values. Then I would look up the first octet of an ip address, taking the keys of a subhash (which are all the prefixes that start from this octet), iterate through them using Net:IP overlaps and most probably - get an answer finally. But it was more than 3 seconds, and it's way too much. I guess it was pretty much memory consuming too, but since it was too slow I didn't even get to checking memory.

      So I found a Net::Patricia module, fed it with the prefix list and assigned "user_data" which were the AS numbers. Now, using match_string method I get what I want in less than a second. Perfect! However the memory problem occurred and I don't know how to cache it, share it, and so on.

      my $pt = new Net::Patricia; (fill it) my $memd = new Cache::Memcached {(usual stuff)}; $memd->set( 'pt', $pt ); (...) my $ptm = $memd->get('pt'); my $asn = $ptm->match_string($given_ip);

      And what I get is:

      perl in free(): warning: chunk is already free perl in free(): warning: page is already free Assertion failed: (prefix->ref_count > 0), function Deref_Prefix, file + patricia.c, line 362. zsh: abort (core dumped) perl ipmemcached_00.pl

      I'm not saying it "doesn't work", I know I might have just gotten it wrong, although I tried various approaches and all I got where core dumps (which surprised me a lot, I must say) or undefs, hence this thread.

        Well, from what I can see, Cache::Memcached only allows you to store/share simple key/values pairs, not complex objects or data structures. So I don't see any way to use Net::Patricia in conjuction with C::MC.

        If NET::Patricia is fast enough for your purpose, then I see no benefit of further 'caching', and no efficient way to 'share' the N::P object between processes. The next obvious option is to set up the N::P object as server.

        There are all sorts of server types that you could use, and you could write it in Perl as a forking, pre-forking, threaded or select-driven. But using perl to develop a server for high-speed, short transaction services using a single, large, shared data object, the select-model is the only game in town.

        To that end, POE::Component::Server::TCP or even POE::Component::Server::SimpleHTTP would probably serve [sic] your purposes with minimal programming cost.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: (mem)caching of ip prefixes?
by shmem (Chancellor) on Jun 28, 2007 at 13:56 UTC
    Let's assume an sql-database is not an option here.

    update 3: the code in here is obsolete. See IP prefixes: match IP against "best" network.

    You could use just DB_File with DB_BTREE - it's lightning fast and provides partial match:

    #!/usr/bin/perl use DB_File; use Fcntl; my $filename = 'astable.db'; my $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE or die "cannot open $filename: $!\n"; # set up AS table. To perform a binary search, # we build a bit string as a sequence of alternating bits # of network address and mask, i.e. having 10.223.2.0/23, we get # # net: 0 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 + 0 # mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 +0 # key: 10101010111011101111101111111111101010101010110000000000000000 +00 # # for ip searches, we set the "mask part bits" of an ip all to 1, and +do # a incremental partial match against the keys in the table. while(<DATA>) { chomp; my $ip = $_; my ($net,$mask) = split /\//; my @nbits = split//,unpack("B*",pack"C4",split/\./,$net); my @mbits = ((1) x $mask, (0) x (32 - $mask)); # merge bits my $key; for (0..31) { $key .= shift(@mbits) . shift(@nbits); } $h{$key} = $ip; # we just store the cidr for demo } sub match { my $ip = shift; my $packedip = pack "C4",split/\./,$ip; my $numericip = unpack "N", $packedip; my @bits = split //,unpack "B*",$packedip; my $dk; my $c = 1; # reset cursor. $x->seq(my $foo,my $bar,R_FIRST); # now search until mismatch. my ($key,$ok,$v,@net); for(@bits) { my $lk = $key .= 1 . $_; next if $ok and $ok =~ /^$key/; # shortcut $x->seq($lk,$v,R_CURSOR); # check if this key is a candidate push @net, $v if in_range($numericip,$lk); unless ($lk =~ /^$key/ && $v && length($lk) == 64) { print "$c lookups - "; return pop @net; # return net with longest mask } $ok = $lk; $c++; } # only one or no prefix found print "$c lookups - "; pop @net; } sub in_range { my ($ip,$bits) = @_; my $net; $bits =~ s/(.)(.)/$net.=$2;$1/ge; my ($addr, $mask) = map { pack "B32",$_ } $net, $bits; my ($n,$m,$b) = map { unpack"N",$_} $addr,$mask,$addr|~$mask; $ip >= $n && $ip <= $b; } my $ip = shift; print match($ip),"\n\n"; __DATA__ 192.168.12.0/24 10.0.0.0/8 10.223.0.0/16 10.223.2.0/17 10.223.2.0/23 10.223.2.0/25 10.1.2.0/23 10.110.1.0/24 10.110.2.0/24 10.110.3.0/24 10.110.4.0/24 10.110.5.0/28 10.114.1.0/24 10.114.10.0/24 10.114.11.0/24 10.114.12.0/24 10.114.13.0/24 10.114.14.0/24 10.114.15.0/24 10.114.16.0/24 10.114.17.0/24 10.114.18.0/24 10.114.19.0/24 10.114.20.0/24 10.114.21.0/24 10.114.22.0/24 10.114.23.0/24 10.114.24.0/24 10.114.25.0/24 10.114.26.0/24 10.114.27.0/24 10.114.28.0/24 10.114.29.0/24 10.114.3.0/24 10.114.30.0/24 10.114.31.0/24 10.114.32.0/24 10.114.33.0/24 10.114.34.0/24 10.114.35.0/24 10.114.36.0/24 10.114.37.0/24 10.114.38.0/24 10.114.39.0/24 10.114.4.0/24 10.114.40.0/24 10.114.41.0/24 10.114.42.0/24 10.114.43.0/24 10.114.44.0/24 10.114.45.0/28 10.114.45.128/28 10.114.45.144/28 10.114.45.16/28 10.114.45.160/28 10.114.45.176/28 10.114.45.32/28 10.114.45.48/28 10.114.45.64/28 10.114.45.80/28 10.114.45.96/28 10.114.49.0/24 10.114.5.0/24 10.114.6.0/24 10.114.7.0/24 10.114.9.0/24 10.116.1.0/24 10.116.10.0/24 10.116.11.0/24 10.116.12.0/24 10.116.13.0/24 10.116.14.0/24 10.116.15.0/24 10.116.16.0/24 10.116.17.0/24 10.116.18.0/24 10.116.19.0/28 10.116.19.112/28 10.116.19.16/28 10.116.2.0/24 10.116.20.0/24 10.116.21.0/24 10.116.22.0/24 10.116.25.0/24 10.116.27.0/24 10.116.3.0/24 10.116.4.0/24 10.116.5.0/24 10.116.6.0/24 10.116.7.0/24 10.116.8.0/24 10.116.9.0/24 10.118.1.0/24 10.118.2.0/24 10.118.3.0/24 10.118.4.0/24 10.118.5.0/28 10.122.1.0/24 10.122.2.0/24 10.122.3.0/24 10.122.4.0/24 10.122.5.0/24 10.122.6.0/24 10.122.7.0/28 10.122.8.0/24 10.122.9.0/28 10.143.1.0/25 10.16.11.0/24 10.16.2.0/23 10.16.4.0/24 10.16.8.0/24 10.184.0.0/28 10.184.1.0/28 10.184.1.128/28 10.184.1.144/28 10.184.1.16/28 10.184.1.160/28 10.184.1.176/28 10.184.1.32/28 10.184.1.48/28 10.250.11.0/24 10.250.5.0/24 10.67.1.0/24 10.67.4.0/24

    Exapmples:

    perl ip.pl 10.114.45.79 11 lookups - 10.114.45.64/28 perl ip.pl 10.223.123.34 5 lookups - 10.223.2.0/17 perl ip.pl 10.223.2.126 7 lookups - 10.223.2.0/25

    Let me know how that behaves with 120000 keys. The number of keys should not matter, and there should be at most 32 lookups per IP, which are done in a blink of an eye. My tests show a lookup rate of roughly 1300 adresses per seconds on average (1.5 GHz Laptop).

    update: corrected bug in in_range() - broadcast calculation was broken, bug was introduced by reformatting...

    update 2: sorry for that hack - horrible style... I'll clean up the ugliness, eventually :)

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: (mem)caching of ip prefixes?
by BrowserUk (Patriarch) on Jun 29, 2007 at 08:03 UTC

    Actually, Net::Patricia turns out to be most impressive, giving sub-30 microsecond lookup times against a database of over 203,000 prefixes.

    Even using the crude, single tasking, blocking server below, it is able to achieve and sustain a throughput of over 100 queries per second using barely 4% of my 2.66GHz processor, when being queried from 100 clients simultaneously, as fast as they can go. Real impressive. Network latency is likely to be your limiting factor.

    #! perl -slw use strict; use IO::Socket::INET; use Net::Patricia; use Time::HiRes qw[ time ]; my $pat = Net::Patricia->new; open I, '<', 'IP-to-ASmapping.txt' or die $!; $pat->add_string( split ' ', $_ ) while <I>; close I; my $listener = IO::Socket::INET->new( LocalAddr => 'localhost:88888', Listen => 1000, Reuse => 1, ) or die "Couldn't listen on localhost:88888; $!"; my $time = time(); my $count = 0; while( my $client = $listener->accept ) { printf "\rRate: %7.3f\t", $count / ( time() -$time ) if ++$count % + 100; chomp( my $ip = <$client> ); print { $client } $pat->match_string( $ip ) || 'n/a'; close $client; } __END__ C:\test>623661-s Rate: 108.149

    And the equally crude threaded client firing randomly generated IPs at the server:

    #! perl -slw use strict; use threads; use threads::shared; use IO::Socket::INET; our $I ||= 1000; our $T ||= 100; my $running :shared = 0; sub thread { { lock $running; ++$running } for( 1 .. $I ) { my $server = new IO::Socket::INET( 'localhost:88888' ) or warn "connect failed: $!" and sleep 1 and next; my $ip = join '.', map{ int rand 256 } 1 .. 4; print $server $ip; printf "$ip was reported as a member of %s", scalar( <$server> + ); close $server; } { lock $running; --$running }; } threads->create( \&thread )->detach for 1 .. $T; sleep 1 while $running < $T; while( 1 ) { sleep 1 until $running < $T; threads->create( \&thread )->detach; } __END__ ... 86.11.176.198 was reported as a member of n/a 222.75.137.143 was reported as a member of n/a 161.31.232.201 was reported as a member of 21852 93.42.178.58 was reported as a member of n/a 219.220.7.27 was reported as a member of 4538 235.24.7.24 was reported as a member of n/a 227.191.55.46 was reported as a member of n/a 206.129.179.146 was reported as a member of 6347 241.200.47.144 was reported as a member of n/a 114.121.196.146 was reported as a member of n/a 23.138.46.31 was reported as a member of n/a 107.39.33.164 was reported as a member of n/a 183.93.218.229 was reported as a member of n/a 223.233.44.15 was reported as a member of n/a 232.140.217.73 was reported as a member of n/a 203.175.78.140 was reported as a member of n/a 204.118.18.156 was reported as a member of 1239 33.185.129.154 was reported as a member of 721 229.141.204.6 was reported as a member of n/a 66.187.219.196 was reported as a member of 22183 132.103.88.186 was reported as a member of 568 49.161.3.238 was reported as a member of n/a ...

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Whoa, thanks! :)

      I'm going to test both daemon and DB file solution and compare results (and post them of course).

      Meanwhile, did anyone try to implement Lulea algorithm for prefix lookup in Perl? http://www.sigcomm.org/sigcomm97/papers/p192.pdf

      Seems pretty much sophisticated.

Re: (mem)caching of ip prefixes?
by roboticus (Chancellor) on Jun 28, 2007 at 11:30 UTC

    Fangorn:

    20MB seems a bit much for storing a 120K lookup table. If all you need is a lookup table to map an IP prefix to a complete address, you might consider an alternative data structure.

    Hashes are fantastic, as they let you ignore lots of arcane data structures, but when they consume too much ram for your app, it's time to consider alternatives. You might find a a ranged structure to be more space-efficient (e.g. "127.0.0 - 127.255.255 -> 192.168.3.5") if you have dense clusters of IP prefixes, or perhaps a multi-level tree may be good. Look over your mapping function to see if there is some commonality you could exploit to construct a data structure to simplify things.

    If your mapping is somewhat arbitrary (i.e., you can change it around some), you might even consider playing around with computing the new IP address, or computing a lookup into a small index->address table.

    ...roboticus