lkperl has asked for the wisdom of the Perl Monks concerning the following question:
Hello all,
First of all thank you for your help. I use strict in my script, but this doesn't appear here.
I have a text file with sentences, one per line. A perl script then extracts each sentence and stores them in a temporary array (@temp). Then I use the following code to extract the duplicates.
@duplicates = grep $seen{$_}++, @temp; # here we count how many times each duplicate appears %seen = (); foreach my $item (@duplicates) { $seen{$item}++; } @unique_duplicates = keys %seen;
Everything works fine.
Now I'd like to make my code object-oriented. The script basically reads each sentence and create a new object:
$record=Entry->new(); $record->id(1); $record->duplicate(0); $record->src("This is a duplicate."); push @records, $record;
Here we have additionnally an ID, a flag 'duplicate' set to 0 and the sentence.
I push the $record to an array for later use. And this is basically where my problems start. I'm used to work with arrays and hashes, but here we have each array element being a hash:
print @recordsgives you
Entry=HASH(0x183efe8)Entry=HASH(0x1835288) etc.I have lots of sentences to process (up to 500'000), so that 's why I prefer to avoid to many loops and extract the duplicates in one pass.
So to summarize, I'd need to identify the duplicates and set the flag 'duplicate' to the number of times the duplicate sentence appears.
Before doing the object-oriented code, the approach was simple. But here it is getting more complicated.
I thank you for your help.
Larry
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: objects and duplicates
by wfsp (Abbot) on Apr 27, 2008 at 17:49 UTC | |
|
Re: objects and duplicates
by stiller (Friar) on Apr 27, 2008 at 18:20 UTC | |
by Anonymous Monk on Apr 27, 2008 at 18:45 UTC | |
by pc88mxer (Vicar) on Apr 27, 2008 at 18:50 UTC | |
by lkperl (Initiate) on Apr 27, 2008 at 19:19 UTC | |
by pc88mxer (Vicar) on Apr 27, 2008 at 19:50 UTC | |
by stiller (Friar) on Apr 27, 2008 at 18:56 UTC | |
|
Re: objects and duplicates
by dragonchild (Archbishop) on Apr 27, 2008 at 20:45 UTC |