ewhitt has asked for the wisdom of the Perl Monks concerning the following question:

Need some help trying to regex the address portion of an IPv6 address. I am struggling with matching A) the hex characters B) different address lengths, e.g.

2001:1::/64
2001:0:6:4003::/64
2001::12/128
...etc

Any suggestions would be appreciated.

Replies are listed 'Best First'.
Re: Regex host portion of IPv6 address
by Anonymous Monk on Mar 28, 2008 at 09:14 UTC
Re: Regex host portion of IPv6 address
by quester (Vicar) on Mar 28, 2008 at 09:44 UTC
    I think all the corner cases in IPv6 addresses are likely to make you crazy if you try to do the whole problem in an RE. You could try the is_ipv6 function here:
    my $quad = "[0-9a-fA-F]{0,4}"; my $ipv6addr = "(?:$quad:){2,7}$quad"; sub is_ipv6 { local $_ = $_[0]; return 0 unless m{^$ipv6addr(?:/(\d+))?$}; my $mask = defined($1) ? $1 : 128; # in 5.10: my $mask = $1 // 128; return 0 if /:::/ or /::.*::/ or not /::/ and 7 != tr/:/:/; return ( 0 <= $mask and $mask <= 128 ); }
    But you might be better off to just go ahead and try to convert the address with NetAddr::IP -> new6, and see if the result is true. A quick and dirty test comparing the two approaches...
      Thanks for the response. Your code goes beyond my understanding. I thought I was close with my code. ;-)
      if ($line =~ m{\s+Prefix\s+((.*)::(\w{1,4})/(\d{1,3}))} ) { $ipv6Address = "$1::$2/$3"; }
      I am able to match

      Prefix 2001::1/128
      Prefix 2001::2/128

      but not

      Prefix 2001:1::/64

        The part that's hard to see is that there must be exactly one double colon :: in the address ... unless there are exactly seven colons, in which case there must be no double colon. For example,

        0001:0002:0003:0004:0005:0006:0007:0008
        

        The other and more important thing - which I forgot altogether - is that addresses that are mapped from IPv4 to IPv6 can optionally be written with the last two groups of hex digits replaced by a dotted-decimal IPv4 address, as in these two:

        ::ffff:10.32.12.1
        ::10.32.12.1
        

        which are equivalent to

        ::ffff:0a20:0c01
        ::0a20:0c01
        

        I think the suggestion from an Anonymous Monk of "Work less, use Net::IPv6Addr" is really a much better idea.

        Adding the code to parse mapped address would turn the code I suggested from a mess into an abomination... a useful one to be sure, but still very very abominable. As Otto von Bismark might say, "Some CPAN modules are like laws and sausages, it is better not to see them being made."

Re: Regex host portion of IPv6 address
by steph_bow (Pilgrim) on Mar 28, 2008 at 08:56 UTC

    My suggestion ( tested)

    my $infile = "try.txt"; open my $INFILE, q{<}, $infile or die; my $outfile = "results.txt"; open my $OUTFILE, q{>}, $outfile or die; while (my $line = <$INFILE>){ $line =~ s/\s+$//; print STDOUT "the line is $line\n"; #if ($line =~ /(\d+):(\d)::\/(\d+)/){ if ($line =~ /(\d+):(.*)\/(\d+)/){ print $OUTFILE "the first decimals are $1\n"; print $OUTFILE "the last decimals are $3\n"; print $OUTFILE "the middle element is $2\n"; } print $OUTFILE "\n"; } close $INFILE; close $OUTFILE;

    In $1: you have 2001

    Perl understands the ":" in the regex

    in $2, you have 1:: or 0:6:4003:: or :12

    in $3, you have 64 or 128

    Hope it helps !

    Here are the reuslts

    the first decimals are 2001 the last decimals are 64 the middle element is 1:: the first decimals are 2001 the last decimals are 64 the middle element is 0:6:4003:: the first decimals are 2001 the last decimals are 128 the middle element is :12
Re: Regex host portion of IPv6 address
by apl (Monsignor) on Mar 28, 2008 at 09:48 UTC
Re: Regex host portion of IPv6 address
by locked_user sundialsvc4 (Abbot) on Mar 28, 2008 at 13:49 UTC

    The function split might be all you need: my @parts = split(/\:/, $address); or somesuch.

    But the recommendation to “use CPAN” is the best one overall. You see, right now you are struggling to solve a problem that has already been solved. Perhaps without thinking too much about it at the time, you dropped straight to “how” you could solve the problem... when it is very often better to first consider “what” the essential problem is. At the end of the day, what you want to walk-away with is simply the result, and if you can completely-avoid dealing with an algorithm, so much the better.

      I agree with the CPAN recommendation. The reason I was taking my approach was to extract the IPv6 address from strings of text, then break it down. Net::IPv6Addr::ipv6_parse works great for the latter, but I am still trying to figure out the first part.
        UTSL
        my %ipv6_patterns = ( 'preferred' => [ qr/^(?:[a-f0-9]{1,4}:){7}[a-f0-9]{1,4}$/i, \&ipv6_parse_preferred, ], 'compressed' => [ ## No, this isn't pretty. qr/^[a-f0-9]{0,4}::$/i, qr/^:(?::[a-f0-9]{1,4}){1,6}$/i, qr/^(?:[a-f0-9]{1,4}:){1,6}:$/i, qr/^(?:[a-f0-9]{1,4}:)(?::[a-f0-9]{1,4}){1,6}$/i, qr/^(?:[a-f0-9]{1,4}:){2}(?::[a-f0-9]{1,4}){1,5}$/i, qr/^(?:[a-f0-9]{1,4}:){3}(?::[a-f0-9]{1,4}){1,4}$/i, qr/^(?:[a-f0-9]{1,4}:){4}(?::[a-f0-9]{1,4}){1,3}$/i, qr/^(?:[a-f0-9]{1,4}:){5}(?::[a-f0-9]{1,4}){1,2}$/i, qr/^(?:[a-f0-9]{1,4}:){6}(?::[a-f0-9]{1,4})$/i, \&ipv6_parse_compressed, ], 'ipv4' => [ qr/^(?:0:){5}ffff:(?:\d{1,3}\.){3}\d{1,3}$/i, qr/^(?:0:){6}(?:\d{1,3}\.){3}\d{1,3}$/, \&ipv6_parse_ipv4, ], 'ipv4 compressed' => [ qr/^::(?:ffff:)?(?:\d{1,3}\.){3}\d{1,3}$/i, \&ipv6_parse_ipv4_compressed, ], );