wilbur has asked for the wisdom of the Perl Monks concerning the following question:

There may be a module for this but I'd like to figure it out myself(or maybe with a little help) Say you have two sets of lines of text, and you need to find all lines in the first list that either doesn't match any line in the second list or if there are lines missing in the first list(lines in the second list that should be in the first). I'm doing the first part with a bunch of nested if's(I know it's horrible).and the second part with if,then,else statements. How do I replace the nested if's and do the same thing, for example: if test1 != record1 and =! record2, and =! record3,4,5.... do something.

Replies are listed 'Best First'.
Re: ugly nested if's
by dragonchild (Archbishop) on Apr 22, 2004 at 15:18 UTC
    You're looking for one of two things. You can either create a hash with the keys being the second list or you can use grep.
    my @list1 = get_from_some_place(); my @list2 = get_from_some_other_place(); -------- # Hash method my %lookup_hash; @lookup_hash{@list2} = undef; # This is a hashslice. We don't care abo +ut the values. unless (exists $lookup_hash{$list1[0]}) { # Do something here } -------- # grep method unless (grep { $_ eq $list1[0] } @list2) { # Do something here }

    In general, the hash is faster, but takes more memory.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      Both methods only check whether the first entry of the first list occurs in the second list. I highly doubt that your hash method is faster in such a case.

      However, it was my understanding the OP wanted to know whether the first list contains entries that aren't in the second list. This is a FAQ, and a hash solution ought to be used, as that gives you linear behaviour (assuming all your hash inserts go in constant time on average). A grep method will be quadratic, and for any list that doesn't have a trivial size, it should not be taking into consideration. After all, both methods use a linear amount of memory.

      Abigail

      Just a nitpick: I do not like your statement with the assignment to the hash slice.
      @lookup_hash{@list2} = undef;
      It'll do exactly what you want, but only in this particular case. It exhibits a lack of elegance: it looks like it's doing something other than what it's really doing. Let me demonstrate what I mean, by assigning a different value:
      my @list2 = qw(one two three); my %lookup_hash; @lookup_hash{@list2} = 'foo'; use Data::Dumper; print Dumper \%lookup_hash;
      Result:
      $VAR1 = { 'three' => undef, 'one' => 'foo', 'two' => undef };
      If you were expecting to see "foo" for every value, think again. You are assigning an explicit value (in your code, undef) to the first hash element in the slice, and an implicit undef to all the others.

      As for better idioms (IMO)... In this particular case, I'd assign an implied undef for all hash elements, no exceptions:

      @lookup_hash{@list2} = ();
      In order to assign the same explicit value to all:
      @lookup_hash{@list2} = ('foo') x @list2;

      --
      Thank you. I'll get off my soap box now.

        Actually, the idiom I wanted to use was
        undef @lookup_hash{@list};

        But, I've found that it fails for symbolic references living in packages other than the one you're executing in under some builds of Perl. Doing an explicit assignment (either with undef or the empty list) works in all builds I've tried.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Re: ugly nested if's
by sgifford (Prior) on Apr 22, 2004 at 16:29 UTC

    A common way to solve this is to sort both lists. Then you can alternately advance through each list to see what items are in both, in just the first, or in just the second.

    For example:

    #!/usr/bin/perl -w use strict; my @red_things = qw(apple little_corvette strawberry lava); my @fruits = qw(apple pineapple strawberry banana); my @sorted_red_things = sort @red_things; my @sorted_fruits = sort @fruits; while (@sorted_red_things || @sorted_fruits) { my($thing,$fruit)=($sorted_red_things[0],$sorted_fruits[0]); if (defined($thing) and (!defined($fruit) or ($thing lt $fruit))) { print "A red thing that's not a fruit: $thing\n"; shift @sorted_red_things; } elsif (!defined($thing) or ($thing gt $fruit)) { print "A fruit that's not red: $fruit\n"; shift @sorted_fruits; } elsif ($thing eq $fruit) { # It's in both lists shift @sorted_red_things; shift @sorted_fruits; } }

    You can sort into files to conserve memory.

    The runtime of this will be pretty good. Total time is time to sort first list + time to sort second list + time for scan, which is O(n lg n + n lg n + 2n) = O(n lg n).

Re: ugly nested if's
by Roy Johnson (Monsignor) on Apr 22, 2004 at 16:09 UTC
    You might look at the code in List module providing any, all and none. It sounds to me like you want to get any(@list1) != all(@list2). I haven't really played with it.

    The PerlMonk tr/// Advocate

      You could certianly use it but it won't work if you care about order. my $inlistAonly = any(@listA) ne any(@listB) That would return any elements of listA that have no corresponding partner in listB and you could do the same in reverse. I'm not sure thats realy what the OP was asking for though.


      ___________
      Eric Hodges