creating a new file with unique values

afasch01 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: creating a new file with unique values by Gilimanjaro (Hermit) on Jan 20, 2003 at 17:42 UTC
Assuming the order of the occurrences doesn't matter, the easiest way would be to store the digits as keys in a hash. Keys are always unique. You could use any value as your hash-value; `$file = "project.txt"; open(FILE,"$file"); my %hash = (); while(my $a=<FILE>) { if($a=~/\tCM+(\d*)/io) { print "$1\n"; $hash{$1}=undef; } } close FILE; open(OUTPUT,">> project.out"); while(my $key = each %hash) { print OUTPUT "$key\n"; } close OUTPUT;` [download] Or how I would probably write it: `my %done=(); open INPUT, "<$inputfile"; open OUTPUT, ">>$outputfile"; while(<INPUT>) { next unless /\tCM+(\d+)/io; next if exists $done{$1}; print OUTPUT, "$1\n"; $done{$1}=undef; ) close INPUT; close OUTPUT;` [download] Haven't tested it, but it should work, and be fast... The <HANDLE> operator and regular expressions have the useful property that they both use the default variable ($_) if you supply none. This way you can bypass the use of $a.	[reply] [d/l] [select]
Re: creating a new file with unique values by BrowserUk (Patriarch) on Jan 20, 2003 at 17:55 UTC
perl -nle "print $1 if /^CM(\d+)/i and ++$h{$1} ==1; " numbers >unique (Adjust quotes to system reqs.) Examine what is said, not who speaks. The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.	[reply]
Re: creating a new file with unique values by hardburn (Abbot) on Jan 20, 2003 at 17:35 UTC
First, please use <CODE> tags around your code. Second, always use strict and warnings Now for your problem. If you have the memory, put your data into a hash. Don't print it to the output file until you've read all the data from the input file. `my $file = "project.txt"; # Use three-argument form of open() (available in perl 5.6.0) # and check the return value. open(FILE, '<', $project) or die "Can't open $project: $!\n"; my %input; while(my $line = <FILE>) { chomp $line; # Get rid of whitespace (newline) at the end of the str +ing if($line =~ /\tCM+(\d*)/io) { $input{$1}++; } } close(FILE);` [download] After the above code runs, %input will contain the digits as keys, with the value being the number of times that key shows up in the input file. Printing to the output file is even easier. Just check if the value of in the %input hash is greater than 1 before printing: `open(OUT, '>>', 'project.out') or die "Can't open project.out for writ +ing: $!\n"; foreach my $i (keys %input) { print OUT "$i\n" unless $input{$i} > 1; } close OUT;` [download]	[reply] [d/l] [select]
Re: Re: creating a new file with unique values by poj (Abbot) on Jan 20, 2003 at 17:52 UTC
Correction `open(FILE, '<', $file) or die "Can't open $file: $!\n";` [download] poj	[reply] [d/l]
Re: Re: creating a new file with unique values by Gilimanjaro (Hermit) on Jan 20, 2003 at 17:51 UTC
Why would you want to delay writing the file? If no hash-entry exists yet, you know it can be written anyway... Also using foreach/keys to loop thru a hash if very inefficient, especially with big hashes; perl has to traverse the entire hash to collect all the keys, and when obtaining the value in the loop body, it has to look-up the key in the hash again. The preferred method would be to use a while/each loop. Your code would then look like: `while(my ($i,$count) = each %input) { print OUT "$i\n" unless $count>1; }` [download] Using the while/each construct would also be a lot cleaner if the hash happened to be something like a tied database query result hash, if said hash supported database row cursors... But that actually has nothing to do with the topic... :) Happy coding, G.	[reply] [d/l]
Re: creating a new file with unique values by jmcnamara (Monsignor) on Jan 20, 2003 at 18:08 UTC
Here is a one-liner that should do it, the command line options are explained in perlrun: `perl -lne '/\tCM+(\d+)/; print $1 if defined $1 and not $seen{$1}++' f +ile1 > file2` [download] -- John.	[reply] [d/l]
Re: creating a new file with unique values by afasch01 (Initiate) on Jan 23, 2003 at 22:30 UTC
Thank you all for your help! It's not very pretty, it could use some work, but...it actually DOES work! Here is my final if anyone is interested, and thanks again! my %hash=(); open INPUT, "<project.txt"; open OUTPUT, ">>project.out"; while(<INPUT>) { next unless /PW\#/io; next if exists $hash{$'}; print OUTPUT "$'"; $hash{$'}=undef; } close INPUT; close OUTPUT; afasch01	[reply]