Levan has asked for the wisdom of the Perl Monks concerning the following question:

Dear Fellow Monks, i have a question regarding how to use regular expression to get rid of repeated cases. firstly, i have used some regular expression to get the follwing list of C function names:

RcvChar
checkeof
SerialOutputString
SerialOutputString
SerialOutputString
GetFile
ChksumByteByByte
GetChksum
SerialOutputString
SerialOutputString

These are acutally function names that i need to use for stubbing in order to do some unit-testing. so is there any way that i can get rid of the repeated cases.
The codes that i used to get the list is :

open (Mer, "Extracted.c") for $mer(<Mer>){ $control=0; while ($control!=15){ $mer =~ s/ //; $control++; } print $mer; }

update (broquaint): title change (was Getting Rid of Repeated Case)

Replies are listed 'Best First'.
Re: Removing duplicates from list
by davido (Cardinal) on Oct 14, 2003 at 02:48 UTC
    That's what hashes are good at. Since keys have to be unique, just plop your parsed version of $mer into a hash as the hash's key. Value is unimportant, though you could use it as a counter.

    Example just teaking your code slightly:

    use strict; use warnings; open (MER, "Extracted.c") or die "Can't open input file. $!\n"; my %cases; for my $mer(<Mer>){ my $control; while ($control!=15){ $mer =~ s/ //; $control++; } $cases{$mer}++; } foreach $key ( keys %cases ) { print $key, "\n"; }

    I'm not really clear on that the inner loop is for. You only want to remove the first fifteen spaces from $mer? ...ok. I guess that's working out ok. You could alternatively use something like:

    substr($mer,0,15) =~ s/ //g;

    That would eliminate the while loop and $control counter.


    Dave


    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein
      hi,
      thanks for your advice, i have got it to work. by the way, the while loop is to help me align the list to the left as there are some spaces in front. your code really helps!!!
      a thousand thanks!!!
      Levan
Re: Removing duplicates from list
by Zaxo (Archbishop) on Oct 14, 2003 at 02:48 UTC

    The usual way of removing duplicates from a list is to form a hash. Instead of printing in line 8,     $fnames{$mer} = ''; Then you can extract sort keys %fname to be printed, or for whatever other use you have.

    After Compline,
    Zaxo

Re: Removing duplicates from list
by Roger (Parson) on Oct 14, 2003 at 02:48 UTC
    All you need to do is putting the names in a hash and print them out later (the quick fix) -
    open (Mer, "Extracted.c") %vars; for $mer(<Mer>){ $control=0; while ($control!=15){ $mer =~ s/ //; $control++; } $vars{$mer} = 1; } print "$_" foreach(keys %vars);
    Another method is to use the hash to eliminate duplicates on the fly -
    open (Mer, "Extracted.c") %vars; for $mer(<Mer>){ $control=0; while ($control!=15){ $mer =~ s/ //; $control++; } next if $vars{$mer}; $vars{$mer} = 1; print $mer; }
    The following is how I would do this -
    use strict; use IO::File; my $Mer = new IO::File "Extracted.c", "r" or die "Can not open file!"; my %vars; while my $mer (<$Mer>) { ... }
    Try to use IO::File to open a file in perl. It's the preferred method. ;-)

      That last statement is pretty odd. open() has had new enhancements in just about every perl release... why would that happen if it were deprecated?

      Try to use IO::File to open a file in perl. It's the preferred method. ;-)

      It is? It may be one method but I don't think it's the preferred method.

      -- vek --
Re: Removing duplicates from list
by etcshadow (Priest) on Oct 14, 2003 at 02:59 UTC
    (Not a perl answer, but still:) Pipe the output to uniq:
    perl my_script.pl | uniq
    Perl is a fantastic language for writing one-liners, scripts, and full-on applications, but the shell and common shell utilities are incredily useful, too.

    Correction:

    perl my_script.pl | sort | uniq

    ------------
    :Wq
    Not an editor command: Wq

      uniq will only work if the items are sorted (or at least if all of the identical items are consecutive).

      $ printf "a\nb\na\n" |uniq
      a
      b
      a
      

      sort |uniq would work, though, or sort -u.

Re: Removing duplicates from list
by thor (Priest) on Oct 14, 2003 at 12:50 UTC
    Could you use the ctags program? I think that ctags is one of those magical tools on Unix that doesn't get enough play.

    thor

Re: Removing duplicates from list
by flounder99 (Friar) on Oct 14, 2003 at 15:50 UTC
    davido has the best idea but if you want to use a regex you can use something like:
    my $list = "RcvChar checkeof SerialOutputString SerialOutputString SerialOutputString GetFile ChksumByteByByte GetChksum SerialOutputString SerialOutputString"; while ($list =~ s/(\b\w+\b)(.*?)\s+\1\b/$1$2/s) {}; print $list;
    outputs :
    RcvChar checkeof SerialOutputString GetFile ChksumByteByByte GetChksum

    --

    flounder

Re: Removing duplicates from list
by vek (Prior) on Oct 15, 2003 at 00:44 UTC

    You've received some good suggestions from other monks so I just had a couple of comments about your code.

    open (Mer, "Extracted.c");

    Always try and get in the habit of checking the return value of open:

    open (MER, "Extracted.c") || die "open: Extracted.c - $!\n";

    Your for looks a little out of place for processing each line in the file. I dunno, whatever floats your boat I suppose but I think you'll probably see most people use a while loop:

    while (<MER>) { # do stuff }
    -- vek --
      hi,
      actually the codes that i have put up is a shorter version of what i actually had. Cos i think the codes are a bit too long and complicated to be put up on this thread, so i shorted it. thanks for the advice!!!
      Levan