Hello Monks,

I am trying to parse a standard csv file to generate a output file which has got all the records in csv format (without and whitespace line and blank line between them etc)

The specimen input file: -------------------------

+ argument_value + + + ---------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +----------------------------------------------- alay@nkk.com brps@nkk.com, luin@nkk.com sthn@nkk.com toen@nkk.com mara@nkk.com alay@nbkk.com wnrd@nkk.com, jpnd@ckk.com, Daim@nkk.com, nbic@ckk.com, nbrs@crawford +.com, nbc1@Ckk.com,jodo@nkk.com, mara@nkk.com trrt@nkk.com alay@nkk.com alam@mkk.com, Case@nkk.com, miob@ikk.com, JTny@ikk.com, RBwn@ikk.com, + jsab@ikk.com, Shli@nkk.com, Stee@nkk.com, Eron@nkk.com
There is leading space before each email address right after comma, but few exception also there where email record right after comma. The input file having been generated from a pgsql export to csv has its share of blank (white space) lines between lines else where. Also the lines which has less records carry blank no whitespace remaining till next (newline).

Expected output: ----------------

alay@nkk.com, brps@nkk.com, luin@nkk.com, sthn@nkk.com, toen@nkk.com, +mara@nkk.com, wnrd@nkk.com, jpnd@ckk.com, Daim@nkk.com, nbic@ckk.com, + nbrs@crawford.com, nbc1@Ckk.com, jodo@nkk.com, trrt@nkk.com, Case@nk +k.com, miob@ikk.com, JTny@ikk.com, RBwn@ikk.com, jsab@ikk.com, Shli@n +kk.com, Stee@nkk.com, Eron@nkk.com
simple output all comma separated in one sentence unique records (email addresses) only.

all unique records as csv format

I am using a script that does eliminate the white space from the input file, remove duplicate email records when they are in the same line and prints one line at a time. This way the data is coming out a line at a time also with blank no-whitespace at end some of the records as they are in input file etc and not as desired. I believe i need a way to pick each record from input file from each line string it reads then write down one a time in a output file, this way input record could be matched against records written eliminating the duplicate and blank lines etc. Need your help here to find a way to accomplish this.

my script: ----------
# Require CPAN module for parsing CSV text files use Text::CSV; package MAIN ; { # Store our CSV file name my $file = '/ppp.csv'; # Obtain a file handle for our CSV file, or die upon failure open (CSV, '<', $file) or die('Unable to open csv file ', $file, "\n"); # Obtain a Text::CSV object my $csv = new Text::CSV; # Loop on the lines in the CSV file foreach my $line (<CSV>) { # If the line parses successfully, print # otherwise, report the failure if ($csv->parse($line)) { # Extract current line's data as an array my @data = $csv->fields(); #print $data[0], "\t", # The name # $data[2], "\n"; # The email address sub remove_duplicates(\@) { my $ar = shift ; my %seen; for ( my $i = 0; $i <= $#{$ar} ; ) { splice @$ar, --$i, 1 if $seen{$ar->[$i++]}++; } } remove_duplicates( @data ); print "@data\n"; } else { print 'Unable to parse CSV line: ', $line, "\n"; } } # Close file handle close(CSV); } 1; __END__

In reply to duplicate records in a csv file by sanju7

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.