There is more than one way to do it: use strict;
use Text::CSV;
my $csv = Text::CSV->new ();
my %unique_records;
while (my $record = <DATA>) {
$csv->parse($record) or die "Could not parse $record";
my @columns = $csv->fields();
s/.*=\s*(.*)/$1/ for @columns;
print "$columns[0],$columns[1],$columns[4]\n" unless $unique_recor
+ds{$columns[1]}++ ;
}
__DATA__
Id=1,DE RecName: Full=anahnata,DE RecName: Full=deals,DE RecName: Full
+=buy, Type = cat
Id=2,DE RecName: Full=hahhhhaa,DE RecName: Full=sure,DE RecName: Full=
+sue, Type = dog
Id=3,DE RecName: Full=anahnata,DE RecName: Full=deals,DE RecName: Full
+=buy, Type = cat
Id=4,DE RecName: Full=hihahiha,DE RecName: Full=sure,DE RecName: Full=
+sue, Type = horse
Id=5,DE RecName: Full=anahnata,DE RecName: Full=deals,DE RecName: Full
+=buy, Type = cat
Id=6,DE RecName: Full=hahhhhaa,DE RecName: Full=sure,DE RecName: Full=
+sue, Type = dog
I added some duplicate records so you can see that indeed only the first of each duplicate record is printed.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] |
| [reply] |
split is indeed the basis of getting the fields from each record, but generally it is a poor choice to use this for CSV-files. Text::CSV takes care of all the edge cases and will save you a lot of trouble in the long run.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] |
I think the question was not how to extract the fields, but how to exclude duplicate records.
I think in this case it's easiest to just use a hash to record the names, for example:
my %seen=();
...
while(<INPUT_FILE>) {
my @fields=split /=/;
$fields[1]=/^(\d+)/;
my $id=$1;
$fields[2]=/^(\w*)/;
my $name=$1;
$seen{$name}=[$id,$fields[5]];
}
If a name occurs several times, the last occurence is recorded and the other ones discarded.
--
Ronald Fischer <ynnor@mm.st>
| [reply] [d/l] |
use strict;
use warnings;
open FH,"Input_File" or die "can't open file $!\n";
open OFH,">Output_File" or die "can't open file $!\n";
while (<FH>) {
print OFH $1,$2,$3,"\n" if (/Id=(\d+,).*?Full=(.*?,).*?Type =
+(.*)$/) ;
}
The Output_File contains your required output. | [reply] [d/l] |
Your script does not filter out duplicate records!
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] |
hi friend,
Try out this one...
open(FILE,"file.txt") or die $!;
while(<FILE>){
($var1,$var2,$var3)=(split(',',$_))[0,1,-1];
$string="$var1".",$var2".",$var3\n";
$string=~s/(Id=)|(DE RecName: Full=)|(Type = )|\s//g;
print $string."\n";
}
close(FILE);
| [reply] [d/l] |