to avoid redundacy in a file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: to avoid redundacy in a file by amphiplex (Monk) on Jul 15, 2002 at 09:31 UTC
You could remove all duplicate lines with something like this: `my %seen; while (<>) { next if $seen{$_}; print; $seen{$_}++; }` [download] ---- amphiplex	[reply] [d/l]
Re^2: to avoid redundacy in a file by tadman (Prior) on Jul 15, 2002 at 09:43 UTC
To avoid redundancy in your code, you could do this: `my %seen; while (<>) { next if ($seen{$_}++); print; }` [download] You can do it in one shot, so you might as well. Note that this code eliminates all duplicate lines, not just repeated ones. If you want to just ditch repeats, use this: `my $last; while (<>) { next if ($_ eq $last); $last = $_; print; }` [download] Thus lines "A A A B B B A A C C" will be "A B A C" not "A B C" as in the previous bit.	[reply] [d/l] [select]
Re^3: to avoid redundacy in a file by Aristotle (Chancellor) on Jul 15, 2002 at 13:55 UTC
You can do it in one shot, so you might as well. That makes your second snippet `my $prev; while (<>) { next if ($_ eq $prev); print $prev = $_; }` [download] `:^)` Wait, we can shorten that.. `my $prev; while (<>) { print $prev = $_ unless $_ eq $prev; }` [download] Hmm.. `my $prev; $_ ne $prev and print $prev = $_ while <>;` [download] Err.. sorry, got carried away for a second.. Perl is just too seductive. Sigh. `:-)` Makeshifts last the longest.	[reply] [d/l] [select]
Re^4: to avoid redundacy in a file by tadman (Prior) on Jul 15, 2002 at 19:06 UTC
Re: Re: to avoid redundacy in a file by Purdy (Hermit) on Jul 15, 2002 at 15:00 UTC
This won't do exactly as the AM wants - some lines will be duplicate to the user, but not to Perl: `$ more file.txt N AB TX NC AB N TX NC FOO BAR N AB TX NC $ perl test.pl file.txt N AB TX NC AB N TX NC FOO BAR` [download] The first two lines of the file.txt file are "the same" to the user, but not to your program. zejames' solution works to the AM's needs, as it creates an unique key for the hash, based on the AM's definition of a duplicate. Jason	[reply] [d/l]
Re: to avoid redundacy in a file by zejames (Hermit) on Jul 15, 2002 at 09:34 UTC
One way to do it `# We are modifying the $/ variable, so we limit the scope # by adding some {} around the code { local $/ = ''; $^I = '.bak'; # See man perl and the -i switch for that trick @ARGV = ('data.txt'); while (<>) { # The order is not important, so we sort the fields to # obtain a unique id $sorted = join ':', sort split /\s+/; print if (! $seen{$sorted}++ ); } }` [download] HTH Update : add comments to the code -- zejames	[reply] [d/l]
Re: to avoid redundacy in a file by thor (Priest) on Jul 15, 2002 at 11:59 UTC
Depending on your database setup and how you insert rows, you could also impose a unique key constraint. Failing this, you will want to sort your records and then test for equality. i.e. (warning: untested) `my %hash while(<>){ my $key = join " ", (sort (split " ")); $hash{$key} = 1; } #now iterate over the keys of the hash, and either print them out, or +do your insert in to the database` [download] Mind you that this is feasible for small files, for certain values of small. If your file is large, you may want to just do the `join` line, write it to another file, and then let a `sort -u` do your bidding. That assumes that you are on Unix or one of its derivatives (unless there is sort for Windoze... :) thor	[reply] [d/l] [select]