Removing duplicates

RCP has asked for the wisdom of the Perl Monks concerning the following question:

Got great help earlier, now have a couple of questions.

QUESTION #1

file1 data:

# This is a list of selected nets
Net 'GROUND' top side
Net 'GROUND' bottom side
Net '/ad1'   top side
Net '/ad1'   bottom side
Net '/VCCA'  top side
Net '/VCCA'  bottom side
[download]

a. I need to seek all lines that has the word "Net" in file1

b. remove lines that contain duplicates found in the 2nd column only, so that it outputs to file called "file2" to look like below:

'GROUND'
'/ad1'
'/VCCA'
[download]

Question #2

How can I take a file like "file2" and have perl read the column and strip out any lines in another file that are contained in the file2 listing?

While I have similar scripts written in UNIX (Korn), I'm trying to learn PERL's eqivalent code.

Thanks... RCP

edit by thelenm: added tags.

Comment on Removing duplicates Select or Download Code

Replies are listed 'Best First'.
Re: Removing duplicates by revdiablo (Prior) on Feb 25, 2004 at 22:08 UTC
Update: You might want to read `perldoc -q duplicate` for a general discussion of using a hash to check for duplicates. I know giving a complete solution is sometimes frowned upon here, but sometimes I just can't help myself. Here is the first script: `#!/usr/bin/perl use strict; use warnings; my %nets; /^Net '([^']+)'/ and not $nets{$1}++ and print "$1\n" while <STDIN>;` [download] And the second: `#!/usr/bin/perl use strict; use warnings; use File::Slurp; my $file2 = shift or die "Usage: $0 file2 < another-file"; my $nets = join "\|", map {chomp;$_} read_file($file2); /$nets/ or print while <STDIN>;` [download] And you can run the whole chain with something like: `perl script1.pl < file1 > file2 perl script2.pl file2 < another-file` [download] Update: just for fun, here's a version that will do everything in one step: `#!/usr/bin/perl use strict; use warnings; use File::Slurp; my $file1 = shift or die "Usage: $0 file1 < another-file"; my $nets = join "\|", map { chomp; /^Net '([^']+)'/ and $1 or () } read_file($file1); /$nets/ or print while <STDIN>;` [download] Which would be used like: `perl script1+2.pl file1 < another-file` [download]	[reply] [d/l] [select]
Re: Re: Removing duplicates by RCP (Acolyte) on Mar 03, 2004 at 11:50 UTC
Thanks for your help. One problem tho, my system HPUX does not support "FILE::SLURP", it there another approach to this step? Your first part did work great, can't wait to start PERL class! My shell scripts were taking minutes to do the things that PERL got done in mere seconds. Thanks again! RCP	[reply]
Re: Re: Re: Removing duplicates by revdiablo (Prior) on Mar 03, 2004 at 19:28 UTC
another approach to this step? Of course! :-) The idiomatic slurp goes something like this: `my $data = do { local (@ARGV, $/) = $filename; <> };` [download] You may see why I chose to use File::Slurp originally. To integrate that into the second snippet, it would be: `#!/usr/bin/perl use strict; use warnings; my $file = shift or die "Usage: $0 file2 < another-file"; my $file_contents = do { local (@ARGV, $/) = $file; <> }; my $nets = join "\|", map {chomp;$_} $file_contents; /$nets/ or print while <STDIN>;` [download] Note: this code is untested. Update: please note that some people get upset when you write "PERL." The language is called Perl, and the program that executes Perl code is called `perl`. Perl is not an acronym, so capitalizing its name makes it look like you're shouting.	[reply] [d/l] [select]
Re: Re: Re: Re: Removing duplicates by RCP (Acolyte) on Mar 04, 2004 at 12:33 UTC
Re: Removing duplicates by talexb (Chancellor) on Feb 25, 2004 at 20:07 UTC
I'm finding it hard to visualize what you are trying to do. Can you modify your original post so that it's more obvious? To figure out which thing is in one file but not another, put the conents of each file into separate hashes, and using the keys from the first hash, check the second hash. It's pretty standard Perl Cookbook (as published by O'Reilly's) stuff. Alex / talexb / Toronto Life is short: get busy!	[reply]
Re: Re: Removing duplicates by RCP (Acolyte) on Mar 05, 2004 at 15:59 UTC
I have a file "file1" that contains: dave bob rich jim I have a second file "file2" that contains: dave rich I need "file2" listing to remove lines from "file1" to create a "file3" that contains only: bob jim I tried this code: #!/usr/bin/perl use strict; # use warnings; open(MYOUTFILE, ">file3"); open(MYOUTFILE, ">>file3"); my $file = shift or die "Usage: $0 file1 < file2"; my $file_contents = do { local (@ARGV, $/) = $file; <> }; my $nets = join "\|", map {chomp;$_} $file_contents; /$nets/ or print MYOUTFILE while <STDIN>; close(MYOUTFILE); But it does not get me my example of file3 should look like. Help! Thanks.. RCP	[reply]
Re: Re: Re: Removing duplicates by talexb (Chancellor) on Mar 05, 2004 at 18:36 UTC
From the Perl debugger session I just ran: [alex@rand alex]$ perl -de 1 Loading DB routines from perl5db.pl version 1.19 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): 1 DB<1> @file1=qw/dave bob rich jim/; DB<2> @file2=qw/dave rich/; DB<3> foreach(@file1){$list{$_}=1;} DB<4> foreach(@file2){delete $list{$_};} DB<5> print "Remaining elements " . join("; ",keys %list) . "\n"; Remaining elements jim; bob DB<6> [download] You can fiddle with the 'list' hash as you read the file, so there's no need to use an array as the middle man. Alex / talexb / Toronto Life is short: get busy! PS: When posting code, put it in between 'code' tags. That way it doesn't wrap like normal text.	[reply] [d/l]
Re: Re: Re: Re: Removing duplicates by RCP (Acolyte) on Mar 06, 2004 at 18:11 UTC
Re: Removing duplicates by delirium (Chaplain) on Feb 25, 2004 at 21:50 UTC
Something like... `my %hash=(); while(<>) { if (/Net '(\w+)'/){$hash{$1} = 1;} } print "$_\n" for keys %hash;` [download] ...will get you through the first question.	[reply] [d/l]