cmv has asked for the wisdom of the Perl Monks concerning the following question:

Folks-

I actually have 2 questions:
1.) Is using map here the most efficient way to do this (for much larger lists)?
Update: I expect to run into memory problems when I start testing with a full dataset. ++ikegami thanks for the suggestion

2.) How can I get rid of the null list element in the result of the map example? The if statement doesn't seem to be working. I suspect I need some more help understanding how map works (I'm still trying to get my head around it).
Update: Thanks to all for the various suggestions:
++stiller - Thanks for the reminder that perl grep is useful here too, my mind hasn't yet been retreaded off of unix grep!
++andreas1234567 - Thanks for the first for loop suggestion. I should consider simple constructs, but map is just really cool.
++kyle - Nice explanation on what I was missing. Thanks teacher!
++Corion - I could use tr in this simplified example (my bad) the actual work needed isn't tr-able.

#!/opt/exp/bin/perl5.8 use strict; use warnings; use Data::Dumper; my @a = ( qw (onexxx txwxo txhrexe xfourx xxx five) ); my @b = map { $_ =~ s/x//g; $_ if ($_); } @a; print Dumper(\@b);

Replies are listed 'Best First'.
Re: Basic list manipulation, and a bit of map() confusion...
by kyle (Abbot) on Feb 24, 2008 at 15:03 UTC

    The reason your map gives you the empty string in spite of your if is that it always gives you the last expression evaluated, much like do or a sub. Consider this timeless classic (from Evil Interview Questions):

    sub baz { return 11 unless shift } print 'baz(5): ', baz(5), "\n"; print 'baz(0): ', baz(0), "\n"; __END__ baz(5): 5 baz(0): 11

    If you want to take something out of a list processed with map, have the block explicitly return an empty list.

    my @a = ( qw (onexxx txwxo txhrexe xfourx xxx five) ); my @b = map { $_ =~ s/x//g; $_ ? $_ : (); } @a; print Dumper(\@b); __END__ $VAR1 = [ 'one', 'two', 'three', 'four', 'five' ];

    As a side note, you should know that modification of $_ inside map modifies the original list element. Your code (and mine) leaves @a looking like this:

    $VAR1 = [ 'one', 'two', 'three', 'four', '', 'five' ];

    You can use Benchmark to try to find the most efficient way to do something, but for something like this I think you'd have to have a pretty long list before it would make a significant difference. It's generally best to use the most easily understood code until profiling tells you that something is worth optimizing.

Re: Basic list manipulation, and a bit of map() confusion...
by stiller (Friar) on Feb 24, 2008 at 14:33 UTC
    Hi, I wouldn't know much about efficiency, you'll have to test both. But I would use grep rather than map for this:

    my @b = grep { s/x//g; $_ ne ''; } @a;
    hth
Re: Basic list manipulation, and a bit of map() confusion...
by Corion (Patriarch) on Feb 24, 2008 at 15:05 UTC

    Personally I would use a grep before or after the map, but when you want to return a varying number of elements from map you can use the ternary operator:

    my @b = map { s/x//; # this could be written more efficient as tr[x][]d; $_ ? $_ : () } @a;
      s/x//; # this could be written more efficient as tr[x][]d;

      The difference being that  s/x// removes one 'x' character while  tr[x][]d removes all 'x' characters.    To be fair, the OP's code used s/x//g instead   :-)

      And you don't need two statements to accomplish that:

      my @b = map s/x//g ? $_ : (), @a; # Or: my @b = map tr/x//d ? $_ : (), @a;

        Not exactly:

        perl -le "print for map s/x//g ? $_ : (), @ARGV" 1 2 3x 4xx xxx 6x 3 4 6
        perl -le "print for map tr/x//d ? $_ : (), @ARGV" 1 2 3x 4xx xxx 6x 3 4 6

        The return value of s/// and tr[] is the number of substitutions made, not the modified string (unfortunatley).

Re: Basic list manipulation, and a bit of map() confusion...
by tilly (Archbishop) on Feb 24, 2008 at 15:30 UTC
    An important point that nobody else mentioned, take a look at @a after that loop. It has been changed as well. If you wish to avoid that, then you need to copy $_ to a private variable inside of map before you manipulate it.

    Oops: kyle points out that he had mentioned it. I scanned for it and thought I didn't see it. Oops. But it still bears repeating.

Re: Basic list manipulation, and a bit of map() confusion...
by andreas1234567 (Vicar) on Feb 24, 2008 at 14:35 UTC
    I will leave it to other to discuss list operator performance. To remove the undefined/empty element I'd use for and push:
    $ cat 669837.pl # 669837 use strict; use warnings; use Data::Dumper; my @a = qw (onexxx txwxo txhrexe xfourx xxx five); my @b = (); for (@a) { s/x//g; push @b, $_ if $_; } print Dumper(\@b); __END__ $ perl 669837.pl $VAR1 = [ 'one', 'two', 'three', 'four', 'five' ];
    --
    Andreas
Re: Basic list manipulation, and a bit of map() confusion...
by ikegami (Patriarch) on Feb 24, 2008 at 15:39 UTC

    Is using map here the most efficient way to do this (for much larger lists)?

    What kind of problem are you having? Speed? memory? something else?

    If you want to reduce your memory footprint, you could use a counting loop and push.

    for my $i (0..$#a) { my $s = $a[$i]; $s =~ s/x//g; push @b, $s if $s; }

    As for the speed difference between this and your solution, only a benchmark will tell. Should be very similar.

      "for" over an array is optimized; this works just as well from a memory perspective:
      for my $a_val (@a) { (my $s = $a_val) =~ s/x//g; push @b, $s if $s; }
        Thanks. I suspected as much — I already realized naught-but-an-array was special — but I didn't have time to check it when I posted the grandparent.
Re: Basic list manipulation, and a bit of map() confusion...
by Prof Vince (Friar) on Feb 24, 2008 at 20:19 UTC
    I'm not sure if you're aware of it, but your code modifies @a since $_ in map and for aliases the current list element. If you don't want this, you'd have to localize $_ or to copy the current element to a new lexical variable.