Capturing Unique Data

donkost has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Capturing Unique Data by kennethk (Abbot) on Aug 18, 2009 at 17:21 UTC
This is a FAQ. For multiple good ideas how to do this (including code) once you've loaded your data into an array, see How can I remove duplicate elements from a list or array? In your post, you specifically request I need to be able to capture each unique string and assign it to a variable. Is there a specific and compelling reason to use an initially unknown number of scalars as opposed to an array or hash? You could do it with Symbolic references, but those will generally cause more headaches than they will solve. In fact, using them in this kind of case is generally considered a classic example of poor form.	[reply]
Re^2: Capturing Unique Data by donkost (Initiate) on Aug 18, 2009 at 17:39 UTC
Thank you. An array would be easier. I nhave never used a hash, so I'll have to look that up. d...	[reply]
Re^3: Capturing Unique Data by kennethk (Abbot) on Aug 18, 2009 at 17:44 UTC
If you'll spend any significant amount of time working with Perl, you'll definitely want to learn how to effectively use hashes - Perl's hash implementation is one of its greatest features. For some introductory material, see Perl variable types.	[reply]
Re^3: Capturing Unique Data by Zen (Deacon) on Aug 18, 2009 at 18:17 UTC
Scalar, hash, and array. If you're going to use perl, this is the bare minimum. As consolation, note there is a lack of the usual typing demons, so any complex data structure is free from hassle.	[reply]
Re: Capturing Unique Data by ikegami (Patriarch) on Aug 18, 2009 at 17:56 UTC
`my @nc; my %seen; while (<$fh>) { chomp; push @nc, $_ if !$seen{$_}++; }` [download] Or if your input is guaranteed to be sorted, `my @nc; my $last; while (<$fh>) { chomp; push @nc, $last = $_ if !defined($last) \|\| $last ne $_; }` [download]	[reply] [d/l] [select]
Re^2: Capturing Unique Data by donkost (Initiate) on Aug 19, 2009 at 15:29 UTC
Thanks! This worked perfectly!	[reply]
Re: Capturing Unique Data by bichonfrise74 (Vicar) on Aug 19, 2009 at 01:22 UTC
This might get you started. `#!/usr/bin/perl use strict; my %record; $record{$_}++ while (<DATA> ); print sort keys %record; __DATA__ NCLEGEND-11-20 NCLEGEND-11-20 NCLEGEND-1-10 NCLEGEND-1-10 NCLEGEND-1-10 NCLEGEND-1-20 NCLEGEND-1-20` [download]	[reply] [d/l]
Re: Capturing Unique Data by Marshall (Canon) on Aug 18, 2009 at 19:00 UTC
I took a guess here since these 11-20 and 1-10 things look suspiciously like dates. Now maybe these are chapter numbers or something like that? I'm not sure. One point is that the normal alpha-numeric sort works if you have leading zero'es. Otherwise, the sort order will not be numeric. So I just added a leading zero for the single digits, this is what you would need to sort chapters or dates easily. Then I used a hash to count the number of occurences of each digit combo, sorted by that number combo and printed result. #!usr/bin/perl -w use strict; my %date_hash; my @data = qw( NCLEGEND-11-20 NCLEGEND-11-20 NCLEGEND-11-2 NCLEGEND-1-10 NCLEGEND-1-10 NCLEGEND-1-10 NCLEGEND-1-20 NCLEGEND-1-20); foreach my $line (@data) { chomp ($line); #needed if @data is a file handle $line =~ s/-(\d)-/-0$1-/; #add leading zero for month $line =~ s/-(\d)$/-0$1/; #add leading zero for date my $date = ($line =~ m/NCLEGEND\-([\d-]+)/)[0]; #get the "num" part $date_hash{$date}++; } foreach my $date (sort keys %date_hash) { print "$date $date_hash{$date}\n"; #just print "$date\n"; if no need for counter value } __END__ Prints: 01-10 3 01-20 2 11-02 1 11-20 2 [download] Update: This is such an important part of being easily to sort reports by date, that some "amplification" is justified: "2009-08-05" is FAR superior to just "2009-8-5", because the natural alpha sort order will do the right thing for the longer string with leading zero'es. For times, same thing goes: 01:25 is FAR better than 01:25 AM and if you mean 01:25 PM, use 24 hour time 13:25. "2009-08-05 13:25" can be sorted against "2008-08-15 01:25" with just the basic sort in Perl.	[reply] [d/l]
Re: Capturing Unique Data by sanku (Beadle) on Aug 25, 2009 at 04:56 UTC
`open(FILE111,"vv.txt") or die $!; my @file111=<FILE111>; close FILE111; push(my @unique,grep {!$ss{$_}++} @file111); foreach $i(0 .. (scalar @unique - 1)){ $nc="nc".($i + 1); print "$nc = $unique[$i]\n"; }` [download]	[reply] [d/l]