eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

In a shell script, I want to remove all duplicate elements from various PATH-like environment variables, preserving the original order of the elements. Here is what I have so far:

#!/bin/sh MYPATH="/abc/def:fred:bill g:jock:bill g:/abc/def:/abc/def" echo "MYPATH ='$MYPATH'" MYPATH=`perl -e'my%s;print join":",grep(!$s{$_}++,split(/:/,shift))' " +$MYPATH"` echo "MYPATH2='$MYPATH'"

Though I'm fairly happy with that solution, if I've blundered or you can see a better Perl (or non-Perl) way to do it, please respond away. Because my Unix has become a bit rusty, I'm concerned I may be overlooking a "standard" Unix way to do this. (A quick google uncovered this awk path cleaner but I didn't find a standard Unix command for this).

Replies are listed 'Best First'.
Re: Perl one-liner to remove duplicate entries from PATH
by parv (Parson) on Jan 12, 2006 at 06:41 UTC

    see Tad McClellan's reply in regex to clean path thread on comp.lang.perl.misc.

    (After a few seconds of posting...) Never mind, that is what you have posted anyway. After reading "In a shell script" and #!bin/sh, i stopped reading. That will teach me to be the first responder.

Re: Perl one-liner to remove duplicate entries from PATH
by blazar (Canon) on Jan 12, 2006 at 10:43 UTC

    Well it seems to me that you do have a working solution and to add to the other answers you got:

    • Since this is a one-liner and you're not under strict anyway, you don't need my %s;
    • You can access environment variables directly in perl through %ENV:
      MYPATH=$(perl -e'print join":",grep!$s{$_}++,split/:/,$ENV{MYPATH}')
      (I don't like backticks, be it in Perl or shell!)
    • else you may take advantage of -a:
      MYPATH=$(echo $MYPATH|perl -F: -lape'$_=join":",grep!$s{$_}++,@F')

    However be warned: I'm now more playing golf than answering your actual question...

      Or a variant using colon-terminated lines rather than colon-separated fields, allowing the uniqueness code itself to be very straightforward:
      MYPATH=$(echo -n "$PATH"|perl -072 -lne 'print unless $s{$_}++')
      (octal 072 is ":", the colon character)

      Because this is interpreting the entries in $PATH as colon-terminated rather than colon-separated, though, this will tack on an unwanted colon at the end.

      So the full solution wouldn't be as pretty. You'd need something like:

      MYPATH=$(echo -n "$PATH"|perl -072 -lne 'print unless $s{$_}++') MYPATH="${MYPATH%:}"
Re: Perl one-liner to remove duplicate entries from PATH
by sh1tn (Priest) on Jan 12, 2006 at 07:37 UTC
    @PATH = qw(one 1 two 2 two 2); %PATH = @PATH;
    or just take a look at perldsc.


      But the OP specified "preserving the original order of the elements". Hashes don't preserve order.


      s^^unp(;75N=&9I<V@`ack(u,^;s|\(.+\`|"$`$'\"$&\"\)"|ee;/m.+h/&&print$&
        my %path; $path{$_}=$path{$_} || 1 + keys %path for split /$sep/,$path;

        there are other more efficient ways to do that btw, im just feeling moderately cheeky. ;-)

        And for the curious you cant write that as ||= because of a bug in Perl (on some versions of Perl anyway).

        Update:

        Or maybe its not a bug. According to some the docs should be read to say that it should be valid to say

        my %path; $path{$_}||=keys %path for split /$sep/,$path;

        and some are saying that the results are undefined. sigh

        ---
        $world=~s/war/peace/g