Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:


Hi,

I have a file that contains Id,Name,Type

but some times the Name is repeated more than 1 time

i need to have only one name

How can i do it.

My file looks like this

Id=1,DE RecName: Full=anahnata,DE RecName: Full=deals,DE RecName: Full=buy, Type = cat

Id=2,DE RecName: Full=hahhhhaa,DE RecName: Full=sure,DE RecName: Full=sue, Type = dog

I need the File like this

1,anahnata,cat

2,hahhhhaa,dog
Thanks
  • Comment on how to substitute next line with nothing

Replies are listed 'Best First'.
Re: how to substitute next line with nothing
by CountZero (Bishop) on Feb 23, 2009 at 06:53 UTC
    There is more than one way to do it:
    use strict; use Text::CSV; my $csv = Text::CSV->new (); my %unique_records; while (my $record = <DATA>) { $csv->parse($record) or die "Could not parse $record"; my @columns = $csv->fields(); s/.*=\s*(.*)/$1/ for @columns; print "$columns[0],$columns[1],$columns[4]\n" unless $unique_recor +ds{$columns[1]}++ ; } __DATA__ Id=1,DE RecName: Full=anahnata,DE RecName: Full=deals,DE RecName: Full +=buy, Type = cat Id=2,DE RecName: Full=hahhhhaa,DE RecName: Full=sure,DE RecName: Full= +sue, Type = dog Id=3,DE RecName: Full=anahnata,DE RecName: Full=deals,DE RecName: Full +=buy, Type = cat Id=4,DE RecName: Full=hihahiha,DE RecName: Full=sure,DE RecName: Full= +sue, Type = horse Id=5,DE RecName: Full=anahnata,DE RecName: Full=deals,DE RecName: Full +=buy, Type = cat Id=6,DE RecName: Full=hahhhhaa,DE RecName: Full=sure,DE RecName: Full= +sue, Type = dog
    I added some duplicate records so you can see that indeed only the first of each duplicate record is printed.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: how to substitute next line with nothing
by lakshmananindia (Chaplain) on Feb 23, 2009 at 06:18 UTC

    You can use split to accomplish this

    split
      split is indeed the basis of getting the fields from each record, but generally it is a poor choice to use this for CSV-files. Text::CSV takes care of all the edge cases and will save you a lot of trouble in the long run.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      I think the question was not how to extract the fields, but how to exclude duplicate records.

      I think in this case it's easiest to just use a hash to record the names, for example:

      my %seen=(); ... while(<INPUT_FILE>) { my @fields=split /=/; $fields[1]=/^(\d+)/; my $id=$1; $fields[2]=/^(\w*)/; my $name=$1; $seen{$name}=[$id,$fields[5]]; }
      If a name occurs several times, the last occurence is recorded and the other ones discarded.

      -- 
      Ronald Fischer <ynnor@mm.st>
Re: how to substitute next line with nothing
by velusamy (Monk) on Feb 23, 2009 at 06:28 UTC
    Hi,

    Try this

    use strict; use warnings; open FH,"Input_File" or die "can't open file $!\n"; open OFH,">Output_File" or die "can't open file $!\n"; while (<FH>) { print OFH $1,$2,$3,"\n" if (/Id=(\d+,).*?Full=(.*?,).*?Type = +(.*)$/) ; }

    The Output_File contains your required output.

      Your script does not filter out duplicate records!

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: how to substitute next line with nothing
by sanku (Beadle) on Feb 23, 2009 at 07:26 UTC
    hi friend, Try out this one...
    open(FILE,"file.txt") or die $!; while(<FILE>){ ($var1,$var2,$var3)=(split(',',$_))[0,1,-1]; $string="$var1".",$var2".",$var3\n"; $string=~s/(Id=)|(DE RecName: Full=)|(Type = )|\s//g; print $string."\n"; } close(FILE);