joeperl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
Im using the below code to remove repeated lines in my file.

sub remove_repeated(){ my %seen(); local @ARGV = ($file_name); local $^I = ".bak"; while(<>){ %seen {$_}++; next if $seen{$_}>1; print; } }

The problem is that if my input file has 2 lines with just a difference of a white space at the beginning or end, then those 2 lines are considered unique by the above code. How to change it so as to neglect white space characters?

Replies are listed 'Best First'.
Re: remove repeated lines with space differences
by ikegami (Patriarch) on Feb 16, 2010 at 06:46 UTC
    sub remove_repeated { my ($file_name) = @_; my %seen; local @ARGV = $file_name; local $^I = ".bak"; while(<>){ s/^\s+//; s/\s*$/\n/; next if $seen{$_}++; print; } }

    Since you consider the leading and ending white space insignificant, I didn't see a problem with removing it. If you don't want to remove the white space from the file, copy $_, trim the copy, and use the copy as the key to the hash.

    Update: I had local @ARGV = @_; originally, but I reverted it back to one file name since %seen was being incorrectly shared between all the arguments.

      I'd also suggest
      s/\s+/ /g;

      print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
        He said he wanted to ignore differences in leading and trailing white space. I didn't include that because that would move differences in internal white space.