Here's a simpler version that preserves the first line -- I'm including the input data with the script (you'll be reading it from a file), and I'm just printing the output to STDOUT (you can open an output file, or just redirect stdout when you run the script):
(BTW, this sort of thing would normally be done using while(<>) to read and process the input data one line at a time, rather than reading the whole file into memory and then looping over it with for my $line (@lines) -- but it's not a big deal in this case.)use strict; use warnings; my %file_hash; my @lines = <DATA>; for my $line (@lines) { chomp($line); my ($name,$url,$text) = split('@',$line); $file_hash{$url} = $line unless ( $file_hash{$url} ); } for my $key ( sort keys %file_hash ) { print "$file_hash{$key}\n"; } __DATA__ name1@url1@text1 name1@url1@text1 name1@url1@text11 name2@url2@text2 name2@url2@text21 name3@url3@text3
You might be interested in looking at this command-line utility: col-uniq -- remove lines that match on selected column(s). It would produce the output you want from your particular input file like this:
But in order for that to work, you'd need to make sure the input data was sorted according to the url field, whereas the input doesn't need to be sorted for the snippet above to work.col-uniq -d '@' -c 2 input.file > output.file
In reply to Re^3: removing duplicates lines plus strings from a file
by graff
in thread removing duplicates lines plus strings from a file
by kirpy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |