karpatov has asked for the wisdom of the Perl Monks concerning the following question:
Thank you karpatov
The program should simplify the csv file by removing rows where a value of certain column is not in a subset of possible values.
What the programs does is:
1.read file with a keyID(1st line) and its values=keys(2nd...nth lines)
2. reads a file to be simplified line by line
3. compares values of column whose header equals keyID with keys
4. lines that pass the test are writen to a new file.
My script is as follows:
#! perl -w use strict; use warnings; sub trim($) { my $string = shift; $string =~ s/^\s+//; $string =~ s/\s+$//; return $string; } my $forextract = shift; # IDs to be extracted, 1 per row , further inf +o tab delimited #header[0] is matched against header of dtp_all my $dtp_all = shift; # csv CANCER60xxx.lis.... my $dtp_my = shift; # csv output print "Warning: The file with items to extract has to have a header - +the first column name will be matched.\n"; open(IDS,"<$forextract") or die "Opening $forextract failed.\n"; open(ALL, "<$dtp_all") or die "Opening $dtp_all failed.\n"; open(MY, ">$dtp_my") or die "Opening $dtp_my failed.\n"; my @ids_table=<IDS>; close(IDS); my $myID=""; my @ids=(); my $count=0; my $colN=-1; foreach my $row (@ids_table){ my @allcolumns = split /\t/, $row; if($count==0){$myID= trim($allcolumns[0]);}else{ $ids[$count] = trim($allcolumns[0]);} $count++; } # copy the header my $line = <ALL>; print MY $line; my @colnames= split /,/, $line; my $count2=0; foreach my $colname (@colnames){ if($colname eq $myID){$colN = $count2;} $count2++; } if ($colN ==-1){print "Column $myID not found.\n"; exit; } # parse the input line by line, when ids matched write the line into t +he output while ($line = <ALL>) { my @columns = split /,/, $line; my $NSC = trim($columns[$colN]); foreach my $id (@ids) { if ($id eq $NSC) { print MY $line; } } } close(MY) or die "Not completed $dtp_my\n"; close(ALL);
|
|---|