rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

hey monks!

Say I have two arrays:
@array1 = [ "test", "test2", "test2", "test3", "test4", "test4" ]; @array2 = [ "test", "test", "test2.1", "test4.1", "test4.2", "test4.3" + ];
How would I go about matching which items in each array could match?
i.e.
foreach my $item (sort @array1) { foreach my $found (@array2) { if (&match_names($item,$found)) { print "found a match ($item = $found\n"; push(@save, $found); } # end-if } # end-foreach } # end-foreach foreach my $item (sort @array2) { foreach my $found (@array1) { if (&match_names($item,$found)) { print "found a match ($item = $found\n"; push(@save, $found); } # end-if } # end-foreach } # end-foreach sub match_names { # i have this part, its just a simple regex # BUT this must only match in "one direction": test4 matches test 4. +1, but test4.1 doesnt match test4 return 1 if ($x =~ /some_regex_on_$y/); }
To get the result:
@save contains [ "test", "test2", "test4", "test4" ];
because test appears in both arrays once,
the first or second test2 in array1 matches test2.1 in array2
test4 in array1 could match test4.1, test4.2 or test4.3 twice.

Its not the regex matching that I'm having trouble with, its the "duplicate" matches thats stumping me...

At the moment, I'm getting many duplicates for the obvious reason that my code doesnt distinguish if its found a match and ignore it next time round.
Any help would be appreciated.

Cheers,
Reagen

Updated to more accurately present my problem.

Replies are listed 'Best First'.
Re: multiple matching in arrays
by liverpole (Monsignor) on Aug 01, 2006 at 11:37 UTC
    Hi rsiedl,

    If I understand what you're trying to accomplish, I think you want something like this:

    #!/usr/bin/perl -w + use strict; use warnings; + my @array1 = ( "test", "test2", "test2", "test3", "test4", "test4" ); my @array2 = ( "test", "test", "test2.1", "test4.1", "test4.2", "test4 +.3" ); + my @save; + foreach my $item1 (@array1) { foreach my $item2 (@array2) { next unless defined $item2; if (match_names($item1, $item2)) { push (@save, $item1); $item2 = undef; last; } } } + printf "Results: save = %s\n", join(',', @save); sub match_names { # i have this part, its just a simple regex # something like: my ($x,$y) = @_; return 1 if ($y =~ /$x/); return 0; }
    Several points you should take note of:

    1. You really should use strict and warnings.  They will help you catch so many errors and potential errors that you'd miss otherwise.

    2. I converted array1 and array2 from array references to simple arrays.  If you do this assignment:

    my @array1 = [ "A", "B", "C" ];

    you end up with an array containing a single value, which itself is a reference to an array.  For example:

    use strict; use warnings; + use Data::Dumper; + my @array1 = [ "test", "test2", "test2", "test3", "test4", "test4" ]; printf "Contents of 'array1' => %s\n", Dumper(\@array1); + my @array2 = ( "test", "test2", "test2", "test3", "test4", "test4" ); printf "Contents of 'array2' => %s\n", Dumper(\@array2); # Output will be as follows ... Contents of 'array1' => $VAR1 = [ [ 'test', 'test2', 'test2', 'test3', 'test4', 'test4' ] ]; Contents of 'array2' => $VAR1 = [ 'test', 'test2', 'test2', 'test3', 'test4', 'test4' ];

    3. Your logic in match_names was backwards; what you wanted (at least if one goes by your example output) was:

    return 1 if ($y =~ /$x/); # Swapped $x and $y

    4. In match_names, you should really explicitly return a zero if the regex test fails.  Also, you had a typo on the argument assignment (missing ')' when you assigned to "@_").

    5. The code I presented works by deleting each found item from the second array so it won't be available as a "match" the next time.  If you need to preserve your array2, you should make a copy instead:

    my @array2_copy = @array2; foreach my $item1 (@array1) { foreach my $item2 (@array2_copy) { ... } }

    Is that the solution you were looking for?


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: multiple matching in arrays
by rodion (Chaplain) on Aug 01, 2006 at 12:14 UTC
    When worried about duplicates, use a hash. It's only a slight change to your code.
    foreach my $found (@array2) { if (&match_names($item,$found)) { print "found a match ($item = $found\n"; $found_one{$found} = 1; # *** changed line } # end-if } # end-foreach } # end-foreach @save = keys %found_one;
    This will give you one each of the matches. They wont be in the same order as they were found, however. If that's important, then you have to add
    push(@save, $found) if !exists $found_one{$found}; $found_one{$found} = 1;
    inside the loop, instead of doing keys() at the end to load up @save.

    Also, as liverpole notes, you have a typo in the assignments for the sample arrays, where you use square brackets instead of parens. The way it's written the code won't work.

Re: multiple matching in arrays
by imp (Priest) on Aug 01, 2006 at 12:59 UTC
    Depending on the complexity of match_names (and the size of the arrays being tested) you could also use a regular expression to find the matches.

    But for clarity you are probably best off with the nested loops and hash approach offered by rodion

    sub test_regex { my @array1 = ( "test", "test2", "test2", "test3", "test4", "test4" + ); my @array2 = ( "test", "test", "test2.1", "test4.1", "test4.2", "t +est4.3" ); # Create a list of the items in array 1 separated by | # The reverse sort is used to keep the longer names to the front o +f # the list, in order to avoid matching 'test' when we could match +'test4' my $match_array1 = join('|',reverse sort @array1); # Using the 'x' option with regular expressions helps readability, + # even in simple cases like this my $regex = qr{ ^ ($match_array1) }x; my %found_one = (); foreach my $item2 (@array2) { next unless defined $item2; if ($item2 =~ /$regex/) { $found_one{$1}++; } } return keys %found_one; }
Re: multiple matching in arrays
by Moron (Curate) on Aug 01, 2006 at 11:18 UTC
    If you mean avoid duplicate matching that arises naturally from a cross-product operation, you could restrict the cross product to its unique triangle in the inner loop of a nested 3-argument 'for' construction, having the inner loop variable start off only from the value of the outer variable , e.g.:
    for ( my $i = 0; $i <= $#array1; $i++ ) { for ( my $j=$i; $j <= $#array2; $j++ ) { # compare $array1[$i] with $array2[$j] ... } }

    -M

    Free your mind

Re: multiple matching in arrays
by johngg (Canon) on Aug 01, 2006 at 15:30 UTC
    I had produced a solution for your Compare Lists (Non Unique Intersection) but I didn't post it because other Monks had already given some good answers by the time I was able to finish it. I thought I would adapt what I had done to suit this question. It seems a bit long-winded because it makes a pass looking for an exact match in @array2 for each element in @array1 before looking for the shortest non-exact match. It keeps the order of the arrays unchanged.

    Here's the code

    use strict; use warnings; # Set up data and print it out. # my @array1 = qw(test test2 test2 test3 test4 test4); my @array2 = qw(test test test2.1 test4.1 test4.2 test4.3); print qq{\n}, q {Array @array1}, qq{\n}, qq{ @array1\n}, q {Array @array2}, qq{\n}, qq{ @array2\n\n}; # Pass references to @array1 and @array2 to nonUniqIntersect() # which returns three list references, non-unique intersection, # remainders from @arrau1 and remainders from @array2. # my ($rlNonUniqIntersect, $rlFirstLeftOver, $rlSecondLeftOver) = nonUniqIntersect(\@array1, \@array2); # Print results. # print qq{Non-unique intersection\n}, qq{ @$rlNonUniqIntersect\n\n}, q {Left-over from @array1}, qq{\n}, qq{ @$rlFirstLeftOver\n\n}, q {Left-over from @array2}, qq{\n}, qq{ @$rlSecondLeftOver\n\n}; # Fond non-unique intersection. # sub nonUniqIntersect { # Get first and second lists. Initialise list reference # for intersection and hash tables used to tag which # elements from each list have found a match. # my @first = @{$_[0]}; my @second = @{$_[1]}; my $rlNonUniqInt = []; my %firstTags = map {$_ => 0} 0 .. $#first; my %secondTags = map {$_ => 0} 0 .. $#second; # Iterate over the first list. # FIRST: for my $idx1 (0 .. $#first) { # Look for exact matches by iterating over the # second list. # EXACT_SECOND: for my $idx2 (0 .. $#second) { # Skip if this element already matched elsewhere. # next EXACT_SECOND if $secondTags{$idx2}; # Is this an exact match? # if ($first[$idx1] eq $second[$idx2]) { # Yes, push onto intersection list, tag the # elements in each list and move to next # element of first list. # push @$rlNonUniqInt, $first[$idx1]; $firstTags{$idx1} ++; $secondTags{$idx2} ++; next FIRST; } } # We haven't found an exact match so look for the # shortest possible non-exact match. Make a regex # for this element of first list. Initialise list # of possible matches. # my $rxFirst = qr{^$first[$idx1]}; my @possMatches = (); # Iterate over secondlist again looking for # non-exact matches. # POSS_SECOND: for my $idx2 (0 .. $#second) { # Skip if this element already matched elsewhere. # next POSS_SECOND if $secondTags{$idx2}; # If we find a match, push the length and index # onto the @possMatches list. # next POSS_SECOND unless $second[$idx2] =~ $rxFirst; push @possMatches, [length $second[$idx2], $idx2]; } # Go to next element of first list if no possible # matches found. Otherwise, sort possible matches # by length and take the index of the first (shortest) # and push the corresponding element onto the intersection # list, updating the tags as well. # next FIRST unless @possMatches; my $shortestMatchIdx2 = ( map {$_->[1]} sort {$a->[0] <=> $b->[0]} @possMatches )[0]; push @$rlNonUniqInt, $first[$idx1]; $firstTags{$idx1} ++; $secondTags{$shortestMatchIdx2} ++; } # Splice out elements from each list that have been # paired off leaving those that didn't have matches. # foreach my $idx (sort {$b <=> $a} keys %firstTags) { splice @first, $idx, 1 if $firstTags{$idx}; } foreach my $idx (sort {$b <=> $a} keys %secondTags) { splice @second, $idx, 1 if $secondTags{$idx}; } # Return three list references. # return ($rlNonUniqInt, \@first, \@second); }

    and here's the output

    Array @array1 test test2 test2 test3 test4 test4 Array @array2 test test test2.1 test4.1 test4.2 test4.3 Non-unique intersection test test2 test4 test4 Left-over from @array1 test2 test3 Left-over from @array2 test test4.3

    I hope this is of use.

    Cheers,

    JohnGG

Re: multiple matching in arrays
by Mandrake (Chaplain) on Aug 01, 2006 at 13:32 UTC
    Please try this. Similar to the answer I had posted for one of your previous question (again related to comparision of arrays)
    my @result_arr; (in_array($_,\@array2)) && (push @result_arr, $_) for (@array1); sub in_array { (($_[0] eq $_) ||($_=~m/$_[0]/)) && return ($_=" ") for(@{$_[1]}); }
    Thanks.