rrboloor has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to build one small tool for analying dial-peer in cisco voice gateways. In summary, I have lot of regex numbers(or dialed number patterns) each directed to different ports.
for example :
56.. --> directed to port1 ( . - matches any digits)
567. --> directed to port 2
5... --> directed to port x.
Now user inputs a number 5679. output should be port2.
Input -------- output
5600 -------- port1
5700 -------- port x
How will I choose the best pattern in perl because ,5679 matches all the dialed number patters.
Any help on this would be of great help to me.

Replies are listed 'Best First'.
Re: Longest match finding.
by GrandFather (Saint) on Dec 17, 2008 at 10:03 UTC

    If you are building this stuff at run time then some sort of decision tree is probably the best technique. I'd be inclined to use a nested hash. Consider:

    use strict; use warnings; my $input = <<INPUT; 5600 5679 5700 INPUT # Build the tree my %tree; while (<DATA>) { chomp; next unless length; my ($number, $port) = split ','; my @digits = split '', $number; my $node = \%tree; for my $digit (@digits) { last if $digit eq '.'; if (exists $node->{$digit}) { $node = $node->{$digit}; next; } $node = $node->{$digit} = {}; } $node->{port} = $port; } # Process input numbers open my $inFile, '<', \$input or die "Failed to open input file: $!"; while (defined (my $number = <$inFile>)) { chomp $number; my @digits = split '', $number; my $node = \%tree; for my $digit (@digits) { last unless exists $node->{$digit}; $node = $node->{$digit}; } print "$number---port$node->{port}\n"; } __DATA__ 56..,1 567.,2 5...,x

    Prints:

    5600---port1 5679---port2 5700---portx

    Perl's payment curve coincides with its learning curve.
Re: Longest match finding.
by graff (Chancellor) on Dec 17, 2008 at 08:10 UTC
    Make sure your matches are tested in a specific order: longest to shortest.
    if ( /567./ ) { # port 2 } elsif ( /56../ ) { # port 1 } elsif ( /5.../ ) { # port x } else { # what do we do in this case? }
    There might be more elegant or compact ways to express that (e.g. using an array of regex patterns), but the basic requirement is to test the patterns in order of relative specificity.
      But these patterns are not known while coding time. It will be extracted from live gateways.

        Well, how do you as a human know which pattern to use?

        Encode the same criterion in your program.

        Maybe you meant to direct to the "most specific" pattern. Then just sort the patterns according to their specificity:

        my @sorted_patterns = sort { my $spec_a = ($a =~ tr[.][.]); my $spec_b = ($b =~ tr[.][.]); $spec_b <=> $spec_a || $a cmp $b } @patterns;