SQL Results

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: SQL Results by Moonie (Friar) on Feb 11, 2002 at 22:01 UTC
Why don't you do a query that would select the unique users? For example: `SELECT DISTINCT email FROM users` [download] ... Just an idea. - Moon	[reply] [d/l]
Re: SQL Results by jsegal (Friar) on Feb 11, 2002 at 22:11 UTC
If the emails are exact, you could use "distinct" in your SQL statement. E.G. something like `select distinct E_mail from table where <conditions>` [download] That would be the easiest approach -- let the database do the work. You could do the same work in perl using a hash to keep track of the addresses, ingoring the values of the hash. e.g. something like (in pseudo-perl) `my %emailhash = (); foreach $row (result_set) { $emailhash{$row->{email}} = 1; } my @unique_emails = keys %emailhash;` [download] Using a hash where you only care about the keys (setting the values to 1) is a common idiom for ensuring uniqueness from a list when you don't care about the order. A concise form of this idiom would be `%temp_hash = map {$_ => 1} @non_unique_list; @unique_list = keys (%temp_hash); # or @unique_list = sort keys (%temp_hash);` [download] The latter case can be used if you want the list ordered, and can extract the order from the values of the entries themselves. Hope this helps... -JAS	[reply] [d/l] [select]
Re: SQL Results by vek (Prior) on Feb 11, 2002 at 22:07 UTC
If you store the e-mail address as a hash key you will eliminate duplicates when doing a `for (keys(%theEmailAddressHash)) { # your email code here... }` [download] ...Just another idea.	[reply] [d/l]
Re: SQL Results by Speedy (Monk) on Feb 12, 2002 at 04:39 UTC
If you use a hash technique to remove duplicates, be careful of the case. Otherwise John@isp.net will be different from john@isp.net, and John john will get two copies of the email. When saving email addresses from Web forms I generally make the stored results lower case.	[reply]
Re: Re: SQL Results by theorbtwo (Prior) on Feb 12, 2002 at 13:17 UTC
Careful here. Though in most cases, the username part of email addresses is case insensitive, I'm fairly certian that this isn't gaurnteed. (The domainname part /is/ gaurnteed to be case insensitive, however.) `TACCTGTTTGAGTGTAACAATCATTCGCTCGGTGTATCCATCTTTG ACACAATGAATCTTTGACTCGAACAATCGTTCGGTCGCTCCGACGC`	[reply]
(jeffa) 2Re: SQL Results by jeffa (Bishop) on Feb 12, 2002 at 15:29 UTC
Most correct about case! Here is one way to elimate 'would-be' duplicates who simply are not because of case sensitivity: `use strict; my %unique; my @email = qw( John@foo.com john@foo.com CAPTELO@microshaft.con CAPTELO@microshaft.com captntenille@muskrat.com CaPtNtEnIlLe@muskrat.com ); for (@email) { my $key = lc $_; $unique{$key} = $_ unless ($unique{$key} and $unique{$key} =~ /[A-Z +]/); } print join("\n", values %unique), "\n";` [download] Yields the list: CAPTELO@microshaft.com CaPtNtEnIlLe@muskrat.com CAPTELO@microshaft.con John@foo.com Of course, this is NOT perfect. Just some random thoughts if you will. ;) If you are not concerned with the extra overhead of space, i recommend storing the email address twice: once as it is, and once all lower-cased. Then you could issue a SQL statement like: SELECT distinct lc_email, email, ... and only use the 'email' column. Actually, even better might be performing a comparison before you insert the email address - make sure that a lower case version of the candidate email address does not already exist in the database first. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
Re: SQL Results by jonjacobmoon (Pilgrim) on Feb 11, 2002 at 22:19 UTC
Frankly, this question has little or nothing to do with Perl However, I will say one thing and that is it sounds like you need to rethink your db schema. Perhaps you should have a seperate table that is indexed by email with an id that is a foreign key in the current table you are referencing. It will make tasks like this easier and your database much more efficent. If you are not familiar with database normalization, I recommend you find out more about it. I don't have a URL to point your to but I am sure google will have plenty. I admit it, I am Paco.	[reply]