sanju7 has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to parse a standard csv file to generate a output file which has got all the records in csv format (without and whitespace line and blank line between them etc)
The specimen input file: -------------------------
There is leading space before each email address right after comma, but few exception also there where email record right after comma. The input file having been generated from a pgsql export to csv has its share of blank (white space) lines between lines else where. Also the lines which has less records carry blank no whitespace remaining till next (newline).+ argument_value + + + ---------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +----------------------------------------------- alay@nkk.com brps@nkk.com, luin@nkk.com sthn@nkk.com toen@nkk.com mara@nkk.com alay@nbkk.com wnrd@nkk.com, jpnd@ckk.com, Daim@nkk.com, nbic@ckk.com, nbrs@crawford +.com, nbc1@Ckk.com,jodo@nkk.com, mara@nkk.com trrt@nkk.com alay@nkk.com alam@mkk.com, Case@nkk.com, miob@ikk.com, JTny@ikk.com, RBwn@ikk.com, + jsab@ikk.com, Shli@nkk.com, Stee@nkk.com, Eron@nkk.com
Expected output: ----------------
simple output all comma separated in one sentence unique records (email addresses) only.alay@nkk.com, brps@nkk.com, luin@nkk.com, sthn@nkk.com, toen@nkk.com, +mara@nkk.com, wnrd@nkk.com, jpnd@ckk.com, Daim@nkk.com, nbic@ckk.com, + nbrs@crawford.com, nbc1@Ckk.com, jodo@nkk.com, trrt@nkk.com, Case@nk +k.com, miob@ikk.com, JTny@ikk.com, RBwn@ikk.com, jsab@ikk.com, Shli@n +kk.com, Stee@nkk.com, Eron@nkk.com
all unique records as csv format
I am using a script that does eliminate the white space from the input file, remove duplicate email records when they are in the same line and prints one line at a time. This way the data is coming out a line at a time also with blank no-whitespace at end some of the records as they are in input file etc and not as desired. I believe i need a way to pick each record from input file from each line string it reads then write down one a time in a output file, this way input record could be matched against records written eliminating the duplicate and blank lines etc. Need your help here to find a way to accomplish this.
my script: ----------# Require CPAN module for parsing CSV text files use Text::CSV; package MAIN ; { # Store our CSV file name my $file = '/ppp.csv'; # Obtain a file handle for our CSV file, or die upon failure open (CSV, '<', $file) or die('Unable to open csv file ', $file, "\n"); # Obtain a Text::CSV object my $csv = new Text::CSV; # Loop on the lines in the CSV file foreach my $line (<CSV>) { # If the line parses successfully, print # otherwise, report the failure if ($csv->parse($line)) { # Extract current line's data as an array my @data = $csv->fields(); #print $data[0], "\t", # The name # $data[2], "\n"; # The email address sub remove_duplicates(\@) { my $ar = shift ; my %seen; for ( my $i = 0; $i <= $#{$ar} ; ) { splice @$ar, --$i, 1 if $seen{$ar->[$i++]}++; } } remove_duplicates( @data ); print "@data\n"; } else { print 'Unable to parse CSV line: ', $line, "\n"; } } # Close file handle close(CSV); } 1; __END__
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: duplicate records in a csv file
by roboticus (Chancellor) on Jun 30, 2010 at 22:09 UTC | |
by sanju7 (Acolyte) on Jul 01, 2010 at 09:24 UTC | |
|
Re: duplicate records in a csv file
by Tux (Canon) on Jul 01, 2010 at 06:05 UTC | |
|
Re: duplicate records in a csv file
by Marshall (Canon) on Jul 03, 2010 at 13:12 UTC | |
by sanju7 (Acolyte) on Jul 14, 2010 at 06:45 UTC |