Hello Monks,
I'm trying to solve a very annoying problem I've been having lately.
The thing is that I have a file containing a tab separated "table" with thousands of rows and dozens of columns, and another tab separated text file, where the names in the table are stored correlatively to their, let's say, "translation". e.g:
Table_name1 New_name1
Table_name2 New_name2
...
Table_name3 New_name3
The table has the following format:
Table_name1 Table_name43 Table_name17 Table_name1245 ...
Table_name2 Table_name4 Table_name37 Table_name125 ...
Table_name3 Table_name51 Table_name69 Table_name342 ...
...
Where any name can appear at any position (and all the names in the table are present in the text file to be "translated".
I tried the following script, that calls a one-liner in order to perform the edition on the table file, replacing each entry from the text file, if it is found.
My problem is that the replacements are not done correctly and I end up with a table full of replicated values that don't correspond to the original. I think this may be som issue with the for loop that contains the call to the one-liner, but I can't seem to be able to fix it.
Maybe the code is easier to understand than the intended explanation of the issue...so here it is:
#!/usr/bin/perl -w use strict; use Getopt::Long; #usage example: perl GetbackIDs.pl -p /path_to_files -e [table fil +e extension] #requires a table file and a "IDs database" in ".txt" format that +share their name my ($path, $ext); GetOptions( 'path=s' => \$path, 'extension=s' => \$ext, ); print "$path\n"; chdir $path or die "ERROR: Unable to enter $path: $!\n"; opendir (TEMP , "."); my @files = readdir (TEMP); closedir TEMP; print "@files\n"; my $name; my @db; for my $file (@files) { if($file=~/(\w+).$ext/){ $name = "$1"; print"This is the Filename: $file\n"; open (INFILE, "$file") || die ("cannot open input file"); chomp(my @data = <INFILE>); my$file2= "$name.bd"; print"This is the DBname:$file2\n"; open (DB, "$file2") || die ("cannot open input file"); chomp(@db = <DB>); } #Edition "on the fly" via One-Liner for(@db){ my ($dbid,$firstid) = split(/\t/, $_); chomp $firstid; print"This is my $dbid and its $firstid\n"; ##ONELINER #if id matches, replace id my$susti=`perl -pi -e 's/$dbid/$firstid/g' $name.$ext`; } }
Examples of data
#Database of table names and new names Aspergillus_clavatus_1 XP_001276684.1 pectate lyase, putative [Aspe +rgillus clavatus NRRL 1] Aspergillus_fumigatus_2 XP_001276694.1 conserved hypothetical prote +in [Aspergillus fumigatus NRRL 1] Aspergillus_flavus_3 XP_001276726.1 tyrosinase central domain prote +in [Aspergillus flavus NRRL 1] Aspergillus_terreus_4 XP_001276738.1 endoglucanase, putative [Asper +gillus terreus NRRL 1] #Lines of the table to be renamed Aspergillus_clavatus_1 Aspergillus_flavus_198 Aspergillus_terreu +s_166 Aspergillus_fumigatus_2 Aspergillus_clavatus_1 Aspergillus_flavus_3 Aspergillus_terreus_ +4 Aspergillus_fumigatus_2 Aspergillus_clavatus_3 Aspergillus_flavus_198 Aspergillus_terreu +_166 Aspergillus_fumigatus_16 #Expected result (See that in some cases there's no replacement to be +done, if the ID is not present in the names "database" file XP_001276684.1 pectate lyase, putative [Aspergillus clavatus NRRL 1] + Aspergillus_flavus_198 Aspergillus_terreus_166 XP_001276694.1 + conserved hypothetical protein [Aspergillus fumigatus NRRL 1] XP_001276684.1 pectate lyase, putative [Aspergillus clavatus NRRL 1] + XP_001276726.1 tyrosinase central domain protein [Aspergillus flavu +s NRRL 1] XP_001276738.1 endoglucanase, putative [Aspergillus terr +eus NRRL 1] XP_001276694.1 conserved hypothetical protein [Aspergi +llus fumigatus NRRL 1] Aspergillus_clavatus_3 Aspergillus_flavus_198 Aspergillus_terreu +_166 Aspergillus_fumigatus_16
Thanks in advance for your help
BestI'd appreciate your counsel on how to rename the title of the post conveniently, because I do not think it is illustrative enough in its current form...
*Update*
As suggested by BrowserUk, I found out that after a few records, names may overlap e.g.(Aspergillus_fumigatus_1 overlaps Aspergillus_fumigatus_10 or Aspergillus_fumigatus_17).
So I guess that is the main source of error during translation.
In reply to Replace table values from text database by Alfumao
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |