remove repeated lines with space differences

joeperl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
Im using the below code to remove repeated lines in my file.

sub remove_repeated(){
  my %seen();
  local @ARGV = ($file_name);
  local $^I = ".bak";
  while(<>){
    %seen {$_}++;
    next if $seen{$_}>1;
    print;
  }
}
[download]

The problem is that if my input file has 2 lines with just a difference of a white space at the beginning or end, then those 2 lines are considered unique by the above code. How to change it so as to neglect white space characters?

Comment on remove repeated lines with space differences Download Code

Replies are listed 'Best First'.
Re: remove repeated lines with space differences by ikegami (Patriarch) on Feb 16, 2010 at 06:46 UTC
`sub remove_repeated { my ($file_name) = @_; my %seen; local @ARGV = $file_name; local $^I = ".bak"; while(<>){ s/^\s+//; s/\s$/\n/; next if $seen{$_}++; print; } }` [download] Since you consider the leading and ending white space insignificant, I didn't see a problem with removing it. If you don't want to remove the white space from the file, copy `$_`, trim the copy, and use the copy as the key to the hash. Update*: I had `local @ARGV = @_;` originally, but I reverted it back to one file name since `%seen` was being incorrectly shared between all the arguments.	[reply] [d/l] [select]
Re^2: remove repeated lines with space differences by Utilitarian (Vicar) on Feb 16, 2010 at 10:59 UTC
I'd also suggest `s/\s+/ /g;` [download] `print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."`	[reply] [d/l] [select]
Re^3: remove repeated lines with space differences by ikegami (Patriarch) on Feb 16, 2010 at 15:47 UTC
He said he wanted to ignore differences in leading and trailing white space. I didn't include that because that would move differences in internal white space.	[reply]
Re^4: remove repeated lines with space differences by joeperl (Acolyte) on Feb 18, 2010 at 06:05 UTC