monger has asked for the wisdom of the Perl Monks concerning the following question:

I am a newbie to Perl. I am working on a script to take a config file from a firewall and strip out all the IP's that we own. Then, I will dump the matching IPs to a file, sort the file with 'sort -u', read the file back in to do an nslookup. That's the goal. The problem is in the matching. The following is the regex I am using for the match. It's just returing too much info.
$regex = '\w159\.230\.([01]?\d\d?|2[0-4]\d|25[0-4])\.([01]?\d\d?|2[0-4 +]\d|25[0-4])'; while (<INFILE>) { if (m/$regex/os) { print PIXOUT $_;
Here is the return. And I apologize, the last two quads are necessarily obfuscated.
network-object 159.230.xxx.xxx 255.255.224.0 network-object 159.230.xxx.xxx 255.255.252.0 network-object 159.230.xxx.xxx 255.255.255.0 network-object host 159.230.xxx.xxx network-object host 159.230.xxx.xxx network-object host 159.230.xxx.xxx
What I want to have in the output file is simply 159.230.xxx.xxx. I have tried several different things, including \w before the regex and $/ afterwards. Those produce no results. The regex as you see it now is what works. This is all being done on a RedHat Linux 7.2 box with Perl 5.6.0. Thanks, Monger

update (broquaint): added <code> to the sample output

Replies are listed 'Best First'.
Re: Match only certain IP Addresses
by fs (Monk) on Aug 06, 2003 at 21:07 UTC
    You're falling into a very common mistake made in dealing with IP addresses. They're actually just a 32 bit number that happen to be commonly written in a really wierd way - dotted quad notation. Use the functions inet_ntoa and inet_aton (perldoc Socket for details) to get them into numbers, and you can quickly and easily sort them properly in perl. If you try to sort them with sort -u, then you'll have problems. For example, '10' will get placed before '5'. First, to get all of the IP addresses out of a text file, use a loop something like this (note that this assumes that there is at most one IP address per line)
    my $ips; while(<INPUT>){ if (/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/{ push @ips, $1; } }
    You don't really say what you IP range is, but I'll use for example the network 192.168.20.0 with a netmask of 255.255.255.0. This will weed out addresses only in this network, sort them numerically, and print them out.
    use Socket; my $mask = 24; # 255.255.255.0 -> 24 bit netmask my $network = inet_aton("192.168.20.0"); my @myips; foreach $ip ( @iplist ) { $ip = inet_ntoa($ip); # convert ascii to decimal if( ($ip & $mask) == ($network & $mask) ){ push @myips, $ip; } } # this will sort the list properly since it's sorting the numbers. # otherwise it would sort 192.168.20.20 before 196.168.20.3 @myips = sort @myips; foreach $ip ( @myips ) { # print out the ascii format print inet_ntoa($ip), "\n"; }
      If you try to sort them with sort -u, then you'll have problems. For example, '10' will get placed before '5'.

      not necessarily true...

      $ cat sorttest 10.1.1.2 100.1.3.2 10.20.3.2 10.2.4.3 5.4.3.2 5.10.3.2 5.5.3.2 $ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n sorttest 5.4.3.2 5.5.3.2 5.10.3.2 10.1.1.2 10.2.4.3 10.20.3.2 100.1.3.2

      I think you meant

      my $mask = inet_aton('255.255.255.0');
      and
      $ip = inet_aton($ip); # convert ascii to decimal

      --Bob Niederman, http://bob-n.com
Re: Match only certain IP Addresses
by Mr. Muskrat (Canon) on Aug 06, 2003 at 21:15 UTC
Re: Match only certain IP Addresses
by sgifford (Prior) on Aug 06, 2003 at 21:03 UTC
    Your problem is that when you print $_, you're printing the entire line. You want to just print the part of the regex that matched. The easiest way to do it is to capture that part of the regex with parentheses, then print out the captured portion:
    #!/usr/bin/perl -w use strict; while (<DATA>) { if (/\s(159\.230\.\d+\.\d+)/) { print $1,"\n"; } } __DATA__ network-object 159.230.123.000 255.255.224.0 network-object 159.230.234.000 255.255.252.0 network-object 159.230.111.000 255.255.255.0 network-object host 159.230.101.000 network-object host 159.230.200.000 network-object host 159.230.6.000
Re: Match only certain IP Addresses
by phydeauxarff (Priest) on Aug 07, 2003 at 01:03 UTC
    In addition to the excellent advice you have already received, if you are going to be doing much manipulation of IP addresses, you will definately want to become friends with the CPAN Module Net::IP...it is a life saver!
      And if you're going to be working with ranges of IPs, Net::CIDR. Both Net::IP and Net::CIDR are IPv6-friendly.

      and if you need really fast lookups give Net::Patricia a try. it's good at what it does.

Re: Match only certain IP Addresses
by monktim (Friar) on Aug 06, 2003 at 21:14 UTC
    Here is a little more than you need. The pattern match grabs everything up to the first space. The keys in the hash contain the unique ip addresses. The values tell you how many times they appeared during processing. This is a classic counting example.
    use strict; use warnings; use Socket; my @ip = ( '159.230.1.1 255.255.224.0', '159.230.1.20 255.255.252.0', '158.230.1.3 255.255.255.0', '159.230.1.3 255.255.255.0', '159.230.1.20 255.255.1.1' ); print join ("\n", @ip); print "\n\n"; my %count; foreach (@ip) { $count{inet_aton($&)}++ if /159\.230\..*?( |$)/; } open(FILE, '>ip.txt'); foreach (sort keys %count) { print inet_ntoa($_)." appears $count{$_} times.\n"; print FILE inet_ntoa($_)."\n"; } close FILE
    UPDATE: Good point on the sort not working out. I put the inet functions in like other posts did. I reread the question and realized the user only wants 159.230 address so I changed the pattern match from /.*? /. That change will also get rid of the leading text if it exists. Of course the \d{1,3} match the other poster mentioned does a little more verification on your data if you decide you need it.
    Good luck.