Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Array Comparison

by mcogan1966 (Monk)
on Dec 22, 2003 at 18:24 UTC ( [id://316435]=perlquestion: print w/replies, xml ) Need Help??

mcogan1966 has asked for the wisdom of the Perl Monks concerning the following question:

I have the need to compare 2 arrays. If an element in array exists in array 1 and array 2, remove it from array 1. Right now, I'm doing this by a standard nested loop method, but this is taking way to long. What I'm looking for is a speedy and concise method to do the following:
WORDS: for($x=0; $x<@words; $x++) { foreach(@exclude) { if (($words[$x] =~ m/\W) || # remove non-words ((length($words[$x])) < 4) || # ignore words less than +4 characters (($x > 0) && (lc($words[$x]) eq lc($words[$x-1]))) || # + drop duplicates (lc($words[$x]) eq lc($_)) ) { splice (@words, $x, 1); $x--; next WORDS; } } }
I should note that @words is sorted before this loop begins. That doesn't seem to be creating any slowness in the code, as @words tends to be small in comparison.

Right now, it can take 2-3 seconds each time this loop is running. There as GOT to be a better way to do this than what I have. It works, but not well enough. Must be faster.

Replies are listed 'Best First'.
Re: Array Comparison
by chromatic (Archbishop) on Dec 22, 2003 at 18:32 UTC

    That nested loop is the killer. How about a hash? The following code is untested and unoptimized but performs better algorithmically:

    my %exclude; @exclude{ @exclude } = (); my @cleaned = grep { exists $exclude{ $_ } ? () : $_ } @words;

    Update: Yep, I confused grep with map in my pre-breakfast haste. That should rather be:

    my @cleaned = grep { ! exists $exclude{ $_ } } @words;
      I think that should be
      my @cleaned = grep { not exists $exclude{$_} } @words;
      Grep returns the original value for every element for which the expression returns true. Your ternary operator would be useful in map, but not here.

      The PerlMonk tr/// Advocate
        Excellent, this works much faster.

        Now, if I can just sort and make the list with just unique entries, I'm all set. Thanks.

Re: Array Comparison
by Ovid (Cardinal) on Dec 22, 2003 at 18:37 UTC

    Untested:

    my @words = qw(foo bar baz quux Ovid); my @exclude = qw(bar Ovid); my %bad_words; @bad_words{@exclude} = @words[@exclude]; @words = grep { ! exists $bad_words{$_} } @words;

    Cheers,
    Ovid

    New address of my CGI Course.

Re: Array Comparison
by jZed (Prior) on Dec 22, 2003 at 18:41 UTC
    @x = qw (1 2 3); @y = qw (3 4 5); my %v = map {$_=>1} @x; my %w = map {$_=>1} @y; for (@x,@y) { push @z,$_ if $v{$_} and !$w{$_} } print (@z);
Re: Array Comparison
by Roy Johnson (Monsignor) on Dec 22, 2003 at 19:00 UTC
    Your code does several things more than you describe. In addition to removing (case-insensitive) duplicates, it also has some other criteria for elimination, which should not be inside the foreach.

    To duplicate your code functionally, we have to:

    1. Remove non-words (you missed a closing / in that pattern, by the way)
    2. Remove words shorter than 4 chars
    3. Remove words that appear more than once (your technique suggests that the list is sorted)
    4. Remove words that appear in @exclude
    Ok, here we go!
    # given @words and @exclude my %seen; @seen{@exclude} = (1) x @exclude; @words = grep { (! /\W/) and (length() >= 4) and ($seen{$_}++ == 0) } @words;
    We pre-load %seen with a flag for each word in @exclude. Then, as we're looking through @words itself, we mark each element as seen as well.

    The PerlMonk tr/// Advocate
      Very nice. Does exactly what I need, and quickly too. Only thing left for this is to make the comparison non-case-dependent and it's perfect. I'll play around with this a touch. Thanks a bunch.

      Got it by making the arrays all lower-case when they are set up. Nice and quick.

Re: Array Comparison
by Not_a_Number (Prior) on Dec 22, 2003 at 20:11 UTC

    To solve your case problem, try this:

    use strict; use warnings; my @words = qw ( a ants cats cats dogs DOGS rabbits r@bbits turtles wa +rthogs ); my @exclude = qw ( ants Turtles WARTHOGS ); my %exclude = map { lc $_ => '1' } @exclude; for ( @words ) { next if /\W/ or length $_ < 4 or $exclude{lc $_}++; print "$_\n"; }

    dave

      Actually, @words and @exclude are generated by reading outside sources. I just dropped the lc() right where the data is being read and it works just fine. Thank you for the suggestion though.
        Actually, @words and @exclude are generated by reading outside sources.

        Well, yeah, that's what I assumed. I just added those arrays for testing. I even assumed that you might not have any control over the 'outside sources', which is why I did the lc() stuff in my snippet :-)

        dave

Re: Array Comparison
by duff (Parson) on Dec 22, 2003 at 18:43 UTC
      No, it's not the intersection I want. And the FAQ doesn't give me the answer I'm looking for either.

      What I need is array 1, minus any elements it has in common with array 2, and the comparison has to be case insensitive.

      Your suggestion does not point to that kind of solution.

Re: Array Comparison
by thraxil (Prior) on Dec 22, 2003 at 23:15 UTC
      Darn right it is. Wow, I had no idea. But now I do know the answer to that question, how and why. So, that's one point for me toward getting that job, whatever it is. Too bad I'm happy where I am right now.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://316435]
Approved by gjb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-03-28 14:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found