jorain has asked for the wisdom of the Perl Monks concerning the following question:

Perl newbie needs help comparing two files that contain IP address to find similarities. I need to match on the first three octets from both files first, then I have to handle a range (e.g., 123.45.67.89-123.89.45.67) - would appreciate any assistance. </readme>
File1 138.63.20.48 63.208.170.231 132.3.0.193 63.208.170.198 63.236.1.136 63.236.1.139 205.161.5.239 The second file looks like this: File2 Company1-100.45.0.0-100.45.255.255 Company2-227.133.171.0-227.133.173.0 Company3-63.208.170.5-63.208.170.254 Company4-95.214.36.0-95.214.39.255 Company5-35.117.181.0-35.117.181.127 Company6-55.207.128.0-55.207.143.255 Company7-138.63.20.12-138.63.20.95
<readme> Need to have a script that will match the ip 138.63.20.48 with the company7 info in file 2. Code is started and looks like this: </readme>
open (IP1,"ipsnew.txt"); open (WHERE,"whereby.txt"); while (<IP1>) { if (/^(.*?)(\d+\.\d+\.\d+)(.*?)$/) { $beg_line1{2} = $1; $ip1 = $2; # print IP1 "$ip1\n"; # print "@ip1\n"; } } while (<WHERE>) { if (/^(.*?)(\d+\.\d+\.\d+)(.*?)$/) { $beg_line2 = $1; $ip2 = $2; # print IP1 "$ip1\n"; # print "@ip2\n"; } } if ($beg_line1{$ip1} == $beg_line2{$ip2}) { print "we have a match at $ip1\n";}
<readme> Would appreciate any help </readme>

Replies are listed 'Best First'.
Re: Compare two files (Ip addresses)
by NetWallah (Canon) on May 16, 2007 at 05:03 UTC
    Others have pointed out vairous structures and file processing mechanics.

    There are two modules that will help you validate and manipulate IP addresses:

    • Regex::Common will help validate/extract IP addresses from text
    • NetAddr::IP will help deal with IP address ranges, and easily discover if a particular address is within a specified range
    my $ip = new NetAddr::IP 'loopback'; print "The address is ", $ip->addr, " with mask ", $ip->mask, "\n" ; if ($ip->within(new NetAddr::IP "127.0.0.0", "255.0.0.0")) { print "Is a loopback address\n"; } # Or - more likely you want to use this .. $me->contains($other) ...

         "An undefined problem has an infinite number of solutions." - Robert A. Humphrey         "If you're not part of the solution, you're part of the precipitate." - Henry J. Tillman

Re: Compare two files (Ip addresses)
by graff (Chancellor) on May 16, 2007 at 02:23 UTC
    I think it would make more sense to read File2 first, to get the labels that need to be associated with various IP ranges. If the known IP values are treated as hash keys (and company names are the hash values), then it becomes very simple to look up the addresses in File1, and spit out the company name when there's a match.

    You just need to make sure to handle the IP-range issues properly -- if I understand the question, a File2 entry like "Company6-55.207.128.0-55.207.143.255" would be a hit for any File1 IP whose third component falls between 128 and 143. Something like this could get you started:

    use strict; my %ip_company; open( I, "File1" ) or die "File1: $!"; while (<I>) { chomp; my ( $company, $bgn_IP, $end_IP ) = split /-/; next unless ( $bgn_IP =~ /^(\d+\.\d+\.)(\d+)\.\d+$/ ); my ( $bgn_q12, $bgn_q3 ) = ( $1, $2 ); next unless ( $end_IP =~ /^(\d+\.\d+\.)(\d+)\.\d+$/ ); my ( $end_q12, $end_q3 ) = ( $1, $2 ); # NB: if $bgn_q12 ne $end_q12, we need some different logic... $ip_company{$bgn_q12.$bgn_q3} = $company; if ( $bgn_q3 != $end_q3 ) { for my $next_q3 ( $bgn_q3+1 .. $end_q3 ) { $ip_company{$bgn_a12.$next_q3} = $company; } } } open( I, "File2" ) or die "File2: $!"; while (<I>) { chomp; ( my $lookup = $_ ) =~ s/\.\d+$//; if ( exists( $ip_company{$lookup} )) { print "$_ is part of $ip_company{$lookup}\n"; } else { print "$_ is not part of any known company\n"; } }
    (not tested)

    Handling a range like "123.45.67.89-123.89.45.67" is left as an exercise... (or maybe you do don't have to go there).

    (update: fixed last sentence so it makes sense)

Re: Compare two files (Ip addresses)
by thezip (Vicar) on May 15, 2007 at 22:52 UTC

    Your data representation in File2 does not match the filespec you describe verbally (ie. Company 2 spans across subnets, and hence it is unclear how/why it would match the entries from File1 if only the last octet is to be compared).

    Do you have any control over how data will be represented in the files? If you do, you might benefit from redesigning the data file layouts.

    Also, please reformat your question with code tags, as in:

    <code> ... your tidy Perl code goes here ... </code>

    Where do you want *them* to go today?