Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All

I have a small problem, I have an sql database with 900 names and emails - I need to send out an email to each person in the database, but looking through the entries a lot of emails appear 3/4 times - so I dont want to do a simple while loop and send the email to these people 4 times.

I'm not sure how to approach this to make sure I only send one email per user on my system, any tips would be very helpfull.

Replies are listed 'Best First'.
Re: SQL Results
by Moonie (Friar) on Feb 11, 2002 at 22:01 UTC
    Why don't you do a query that would select the unique users? For example:
    SELECT DISTINCT email FROM users

    ... Just an idea.

    - Moon
Re: SQL Results
by jsegal (Friar) on Feb 11, 2002 at 22:11 UTC
    If the emails are exact, you could use "distinct" in your SQL statement. E.G. something like
    select distinct E_mail from table where <conditions>
    That would be the easiest approach -- let the database do the work. You could do the same work in perl using a hash to keep track of the addresses, ingoring the values of the hash.

    e.g. something like (in pseudo-perl)
    my %emailhash = (); foreach $row (result_set) { $emailhash{$row->{email}} = 1; } my @unique_emails = keys %emailhash;
    Using a hash where you only care about the keys (setting the values to 1) is a common idiom for ensuring uniqueness from a list when you don't care about the order. A concise form of this idiom would be
    %temp_hash = map {$_ => 1} @non_unique_list; @unique_list = keys (%temp_hash); # or @unique_list = sort keys (%temp_hash);
    The latter case can be used if you want the list ordered, and can extract the order from the values of the entries themselves.

    Hope this helps...

    -JAS
Re: SQL Results
by vek (Prior) on Feb 11, 2002 at 22:07 UTC
    If you store the e-mail address as a hash key you will eliminate duplicates when doing a
    for (keys(%theEmailAddressHash)) { # your email code here... }
    ...Just another idea.
Re: SQL Results
by Speedy (Monk) on Feb 12, 2002 at 04:39 UTC
    If you use a hash technique to remove duplicates, be careful of the case. Otherwise John@isp.net will be different from john@isp.net, and John john will get two copies of the email.

    When saving email addresses from Web forms I generally make the stored results lower case.
      Most correct about case!

      Here is one way to elimate 'would-be' duplicates who simply are not because of case sensitivity:

      use strict; my %unique; my @email = qw( John@foo.com john@foo.com CAPTELO@microshaft.con CAPTELO@microshaft.com captntenille@muskrat.com CaPtNtEnIlLe@muskrat.com ); for (@email) { my $key = lc $_; $unique{$key} = $_ unless ($unique{$key} and $unique{$key} =~ /[A-Z +]/); } print join("\n", values %unique), "\n";
      Yields the list:
      CAPTELO@microshaft.com
      CaPtNtEnIlLe@muskrat.com
      CAPTELO@microshaft.con
      John@foo.com
      
      Of course, this is NOT perfect. Just some random thoughts if you will. ;) If you are not concerned with the extra overhead of space, i recommend storing the email address twice: once as it is, and once all lower-cased. Then you could issue a SQL statement like:
      SELECT distinct lc_email, email, ...
      
      and only use the 'email' column. Actually, even better might be performing a comparison before you insert the email address - make sure that a lower case version of the candidate email address does not already exist in the database first.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
Re: SQL Results
by jonjacobmoon (Pilgrim) on Feb 11, 2002 at 22:19 UTC
    Frankly, this question has little or nothing to do with Perl

    However, I will say one thing and that is it sounds like you need to rethink your db schema. Perhaps you should have a seperate table that is indexed by email with an id that is a foreign key in the current table you are referencing. It will make tasks like this easier and your database much more efficent.

    If you are not familiar with database normalization, I recommend you find out more about it. I don't have a URL to point your to but I am sure google will have plenty.


    I admit it, I am Paco.