in reply to duplicate records in a csv file

Your input file format is not complex enough to need all the power of the parse csv module - when things get hairy that thing is definitely needed. But here, I would use a simple less than one inch regex and match global for the job. The regex below just tokenizes chunks of characters that are in the character set. This has the effect of throwing away the commas, spaces, new lines, etc.

From what I could tell from your desired output, you want unique user names (keep the first one seen) as opposed to simply unique e-mail addresses, so I adjusted the grep{},%seen idea below to do that just on the stuff in front of the "@".

In the join, change ',' to ', ' to add a space after the comma if that is what you want. CSV files usually do not have leading spaces.

#!/usr/bin/perl -w use strict; my @emailaddr; while (<DATA>) { if (/@/) #skip lines without email addresses { push (@emailaddr, $_) foreach /[\w@.]+/g; } } my %seen; @emailaddr = grep { !$seen{(split(/[@]/,$_))[0]}++}@emailaddr; print join(',',@emailaddr),"\n"; #output line #alay@nkk.com,brps@nkk.com,luin@nkk.com,sthn@nkk.com,toen@nkk.com,mara +@nkk.com,wnrd@nkk.com,jpnd@ckk.com,Daim@nkk.com,nbic@ckk.com,nbrs@cra +wford.com,nbc1@Ckk.com,jodo@nkk.com,trrt@nkk.com,alam@mkk.com,Case@nk +k.com,miob@ikk.com,JTny@ikk.com,RBwn@ikk.com,jsab@ikk.com,Shli@nkk.co +m,Stee@nkk.com,Eron@nkk.com __DATA__ + argument_value + + + ---------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +----------------------------------------------- alay@nkk.com brps@nkk.com, luin@nkk.com sthn@nkk.com toen@nkk.com mara@nkk.com alay@nbkk.com wnrd@nkk.com, jpnd@ckk.com, Daim@nkk.com, nbic@ckk.com, nbrs@crawford +.com, nbc1@Ckk.com,jodo@nkk.com, mara@nkk.com trrt@nkk.com alay@nkk.com alam@mkk.com, Case@nkk.com, miob@ikk.com, JTny@ikk.com, RBwn@ikk.com, + jsab@ikk.com, Shli@nkk.com, Stee@nkk.com, Eron@nkk.com

Replies are listed 'Best First'.
Re^2: duplicate records in a csv file
by sanju7 (Acolyte) on Jul 14, 2010 at 06:45 UTC
    This works like a charm. Thank you for your time.