Re^2: What does 'next if $hash{$elem}++;' mean?

Replies are listed 'Best First'.
Re^3: What does 'next if $hash{$elem}++;' mean? by Fletch (Bishop) on Feb 17, 2006 at 15:49 UTC
You'll need to define "doesn't work" (aside from the obvious problems of not checking the return value from your opens, or redefining `@unique`, or not printing newlines in the last `foreach`, or that you seem to be doing redundant work with both the `map { $_, 1 } @array` and the subsequent `foreach`).	[reply] [d/l] [select]
Re^3: What does 'next if $hash{$elem}++;' mean? by inman (Curate) on Feb 17, 2006 at 15:51 UTC
It isn't necessary to do all of the intermediate processing. You can read a line and check it all at the same time. The line is used as a hash key. The value is tested before being incremented and the line is added to the array. `my %seen; while (<PROCESSED_FILE>){ push @unique, $_ unless $seen{$_}++; }` [download] This is OK if the hash doesn't grow to big. Using MD5 hashes of lines is an uesful technique.	[reply] [d/l]
Re^3: What does 'next if $hash{$elem}++;' mean? by dragonchild (Archbishop) on Feb 17, 2006 at 16:06 UTC
`use List::MoreUtils qw( uniq ); sub Remove_duplicate_lines { my ($processed_file, $out_file) = @_; open (PROCESSED_FILE, "<$processed_file"); chomp( my @array = <PROCESSED_FILE> ); close PROCESSED_FILE; my @unique = uniq @array; open (OUTFILE, "+>$out_file"); print OUTFILE join( '', @array ); close OUTFILE; }` [download] My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply] [d/l]
Re^3: What does 'next if $hash{$elem}++;' mean? by johngg (Canon) on Feb 17, 2006 at 18:38 UTC
Chaps, I had a go benchmarking various ways of pulling unique values out of a list which I had seen in various text books. It looks like a hash slice is the quickest on my ancient hardware. Your mileage may vary. #!/usr/local/bin/perl # use warnings; use strict; use Benchmark; our @data = <DATA>; chomp @data; our $rcHash = sub { my %seen = (); $seen{$_} ++ for @data; return keys %seen; }; our $rcHashGrep = sub { my %seen = (); return grep {! $seen{$_} ++} @data; }; our $rcHashSlice = sub { my %uniq; @uniq{@data} = (); return keys %uniq; }; our $rcListHash = sub { my %seen = (); my @uniq = (); foreach my $item (@data) { push @uniq, $item unless $seen{$item} ++; } return @uniq; }; our $rcMapHash = sub { return keys %{{map {$_ => 1} @data}}; }; timethese(5000, { Hash => $rcHash, HashGrep => $rcHashGrep, HashSlice => $rcHashSlice, ListHash => $rcListHash, MapHash => $rcMapHash}); __END__ red blue yellow green black white purple mauve pink grey violet black white blue green red mauve violet black red blue yellow green black white purple mauve pink grey violet black white blue green red mauve violet black mauve violet black red blue yellow green black violet black red blue yellow mauve pink grey violet black white blue green yellow green black iolet black red green black white purple mauve pink yellow green black violet black red blue yellow mauve pink grey violet black white blue green [download] Produces the following metrics. Benchmark: timing 5000 iterations of Hash, HashGrep, HashSlice, ListHash, MapHash... Hash: 4 wallclock secs ( 3.48 usr + 0.00 sys = 3.48 CPU) @ 1436.78/s (n=5000) HashGrep: 4 wallclock secs ( 3.29 usr + 0.00 sys = 3.29 CPU) @ 1519.76/s (n=5000) HashSlice: 1 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 4310.34/s (n=5000) ListHash: 5 wallclock secs ( 5.03 usr + 0.00 sys = 5.03 CPU) @ 994.04/s (n=5000) MapHash: 6 wallclock secs ( 5.89 usr + 0.00 sys = 5.89 CPU) @ 848.90/s (n=5000) Cheers, JohnGG	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re^3: What does 'next if $hash{$elem}++;' mean? by blazar (Canon) on Feb 17, 2006 at 18:03 UTC
`# Simple enough?!? sub remove_duplicate_lines { my ($infile, $outfile)=@_; open my $in, '<', $infile or die "$infile: $!\n"; open my $out, '>', $outfile or die "$outfile: $!\n"; my $oldout=select $out; my %saw; $saw{$_}++ or print while <$in>; select $oldout; }` [download]	[reply] [d/l]