tgrossner has asked for the wisdom of the Perl Monks concerning the following question:

I have a txt file that I need to analyze for matching strings of numbers/text. (Its actually for looking for multiple routes in a routing table, but that shouldnt confuse the issue) So, for instance, I need to search through the file and find lines such as this:
10.35.15.64/29 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.15.96/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.2.128/25 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.24.192/26 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.24.48/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.128/25 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.32/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.48/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.64/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.80/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.0/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.16/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.192/26 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.32/27 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.64/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.96/27 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.8.128/27 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.8.192/27 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.9.128/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11
and print out the lines that match before the first space:
10.1.17.0/24 10.133.3.11 OSPF-EXT2 20 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.1.17.0/24 10.133.3.7 OSPF-EXT2 20 if-servermgm +t.cntx03-servermgmt-10.133.3.7
I havent started writing my code, but I would be willing to bet someone knows a command line string of perl commands that will do this. Any thoughts?

Replies are listed 'Best First'.
Re: Searching for two lines that begin with the same string
by jbert (Priest) on Dec 18, 2006 at 16:15 UTC
    You want a hash indexed by the token you want to match. You might as well store the complete lines as the hash value. Then output the values which have more than one entry. (Untested) code:
    my %routes; while (my $line = <>) { # Grab the route my ($key) = ($line =~ /^(\S+)\s/); # Make sure we have an array ref for this key $route{$key} ||= []; # And add this line push @{$route{$key}}, $line; } foreach my $value (values %routes) { if (@$values > 1) { print @$values; print "\n"; } }
    Something like that anyway.
      You don't have to make sure of having an array ref. for a particular key, your code $route{$key} ||= [];, as doing the push @{$route{$key}}, $line; will push onto an auto-vivified array ref. if one doesn't exist or push onto the existing ref. otherwise.

      Cheers,

      JohnGG

        Thanks very much. I didn't know that push would auto-vivify in that way. That's going to reduce my line count a bit more in the future.
      How are you populating $line from the file? I have tried opening the file then going into the while loop, but it doesnt seem to be reading the lines into the while loop.

        We used <>, which reads from the files named on the command line, or STDIN if none were specified on the command line. You could read from any other file handles instead just by specifying it.

        open(my $data_fh, '<', $data_file_name) or die("Unable to open data file \"$data_file_name\": $!\n"); while (my $line = <$data_fh>) { . . .

        You should replace the word "data" with something more descriptive, at least in the error message.

Re: Searching for two lines that begin with the same string
by ikegami (Patriarch) on Dec 18, 2006 at 16:15 UTC
    Do you care about order? If not, use a hash.
    my %data; while (<>) { my $net = (split ' ')[0]; push @{$data{$net}}, $_; } foreach my $group (values %data) { if (@$group > 1) { print @$group; } }

    Update: Tested. Added missing if.

Re: Searching for two lines that begin with the same string
by shmem (Chancellor) on Dec 18, 2006 at 16:49 UTC
    perl -ne '/^(\S+)/;if($s{$1}){print$s{$1}unless$S{$s{$1}}++;print}$s{$ +1}=$_' file

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Just printing each line 2x

        I'll answer both your posts in one:

        Please show a sample.
        Please show a sample.

        Your sample data doesn't contain duplicates. This does:

        10.35.15.64/29 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.15.96/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.2.128/25 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.24.192/26 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.24.48/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.128/25 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.32/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.48/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.64/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.80/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.0/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.32/28 10.133.3.11 OSPF-EXT1 3 if-illnevera +nswerquestionsagain-10.133.3.11 10.35.26.16/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.192/26 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.32/27 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.64/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.26.96/27 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.8.128/27 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.8.192/27 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.9.128/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11

        If I run my snippet unaltered, I get

        10.35.25.32/28 10.133.3.11 OSPF-EXT1 3 if-servermgm +t.cntx03-servermgmt-10.133.3.11 10.35.25.32/28 10.133.3.11 OSPF-EXT1 3 if-illnevera +nswerquestionsagain-10.133.3.11

        It outputs pairs: the previous and the current match. So if you have 3 lines starting with the same CIDR notated IP address, you'll get 4 lines back for that match. With 4 lines matching, you get 6 lines: 3 pairs.

        <update>

        perl -ne '/^(\S+)/;if($S{$1}){print$S{$1}unless$s{$1}++;print}$S{$1}=$ +_' file

        outputs just 4 lines for 4 matches.

        </update>

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      That is just spitting out each line 2x.
Re: Searching for two lines that begin with the same string
by jettero (Monsignor) on Dec 18, 2006 at 16:26 UTC

    In addition to the suggetions above, I would also probably try to use Net::Netmask to make sure none of the networks overlap either.

    -Paul

      Oddly enough, its ok if the netmasks overlap, we do some of that for failover reasons...for instance, 2 /23's and 1 /24 for the same group of addresses...the "router" this comes from will follow the most specific prefix.