Your input file format is not complex enough to need all the power of the parse csv module - when things get hairy that thing is definitely needed. But here, I would use a simple less than one inch regex and match global for the job. The regex below just tokenizes chunks of characters that are in the character set. This has the effect of throwing away the commas, spaces, new lines, etc.

From what I could tell from your desired output, you want unique user names (keep the first one seen) as opposed to simply unique e-mail addresses, so I adjusted the grep{},%seen idea below to do that just on the stuff in front of the "@".

In the join, change ',' to ', ' to add a space after the comma if that is what you want. CSV files usually do not have leading spaces.

#!/usr/bin/perl -w use strict; my @emailaddr; while (<DATA>) { if (/@/) #skip lines without email addresses { push (@emailaddr, $_) foreach /[\w@.]+/g; } } my %seen; @emailaddr = grep { !$seen{(split(/[@]/,$_))[0]}++}@emailaddr; print join(',',@emailaddr),"\n"; #output line #alay@nkk.com,brps@nkk.com,luin@nkk.com,sthn@nkk.com,toen@nkk.com,mara +@nkk.com,wnrd@nkk.com,jpnd@ckk.com,Daim@nkk.com,nbic@ckk.com,nbrs@cra +wford.com,nbc1@Ckk.com,jodo@nkk.com,trrt@nkk.com,alam@mkk.com,Case@nk +k.com,miob@ikk.com,JTny@ikk.com,RBwn@ikk.com,jsab@ikk.com,Shli@nkk.co +m,Stee@nkk.com,Eron@nkk.com __DATA__ + argument_value + + + ---------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +----------------------------------------------- alay@nkk.com brps@nkk.com, luin@nkk.com sthn@nkk.com toen@nkk.com mara@nkk.com alay@nbkk.com wnrd@nkk.com, jpnd@ckk.com, Daim@nkk.com, nbic@ckk.com, nbrs@crawford +.com, nbc1@Ckk.com,jodo@nkk.com, mara@nkk.com trrt@nkk.com alay@nkk.com alam@mkk.com, Case@nkk.com, miob@ikk.com, JTny@ikk.com, RBwn@ikk.com, + jsab@ikk.com, Shli@nkk.com, Stee@nkk.com, Eron@nkk.com

In reply to Re: duplicate records in a csv file by Marshall
in thread duplicate records in a csv file by sanju7

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.