Hello Monks,

I'm trying to solve a very annoying problem I've been having lately.

The thing is that I have a file containing a tab separated "table" with thousands of rows and dozens of columns, and another tab separated text file, where the names in the table are stored correlatively to their, let's say, "translation". e.g:

Table_name1 New_name1

Table_name2 New_name2

...

Table_name3 New_name3

The table has the following format:

Table_name1 Table_name43 Table_name17 Table_name1245 ...

Table_name2 Table_name4 Table_name37 Table_name125 ...

Table_name3 Table_name51 Table_name69 Table_name342 ...

...

Where any name can appear at any position (and all the names in the table are present in the text file to be "translated".

I tried the following script, that calls a one-liner in order to perform the edition on the table file, replacing each entry from the text file, if it is found.

My problem is that the replacements are not done correctly and I end up with a table full of replicated values that don't correspond to the original. I think this may be som issue with the for loop that contains the call to the one-liner, but I can't seem to be able to fix it.

Maybe the code is easier to understand than the intended explanation of the issue...so here it is:

#!/usr/bin/perl -w use strict; use Getopt::Long; #usage example: perl GetbackIDs.pl -p /path_to_files -e [table fil +e extension] #requires a table file and a "IDs database" in ".txt" format that +share their name my ($path, $ext); GetOptions( 'path=s' => \$path, 'extension=s' => \$ext, ); print "$path\n"; chdir $path or die "ERROR: Unable to enter $path: $!\n"; opendir (TEMP , "."); my @files = readdir (TEMP); closedir TEMP; print "@files\n"; my $name; my @db; for my $file (@files) { if($file=~/(\w+).$ext/){ $name = "$1"; print"This is the Filename: $file\n"; open (INFILE, "$file") || die ("cannot open input file"); chomp(my @data = <INFILE>); my$file2= "$name.bd"; print"This is the DBname:$file2\n"; open (DB, "$file2") || die ("cannot open input file"); chomp(@db = <DB>); } #Edition "on the fly" via One-Liner for(@db){ my ($dbid,$firstid) = split(/\t/, $_); chomp $firstid; print"This is my $dbid and its $firstid\n"; ##ONELINER #if id matches, replace id my$susti=`perl -pi -e 's/$dbid/$firstid/g' $name.$ext`; } }

Examples of data

#Database of table names and new names Aspergillus_clavatus_1 XP_001276684.1 pectate lyase, putative [Aspe +rgillus clavatus NRRL 1] Aspergillus_fumigatus_2 XP_001276694.1 conserved hypothetical prote +in [Aspergillus fumigatus NRRL 1] Aspergillus_flavus_3 XP_001276726.1 tyrosinase central domain prote +in [Aspergillus flavus NRRL 1] Aspergillus_terreus_4 XP_001276738.1 endoglucanase, putative [Asper +gillus terreus NRRL 1] #Lines of the table to be renamed Aspergillus_clavatus_1 Aspergillus_flavus_198 Aspergillus_terreu +s_166 Aspergillus_fumigatus_2 Aspergillus_clavatus_1 Aspergillus_flavus_3 Aspergillus_terreus_ +4 Aspergillus_fumigatus_2 Aspergillus_clavatus_3 Aspergillus_flavus_198 Aspergillus_terreu +_166 Aspergillus_fumigatus_16 #Expected result (See that in some cases there's no replacement to be +done, if the ID is not present in the names "database" file XP_001276684.1 pectate lyase, putative [Aspergillus clavatus NRRL 1] + Aspergillus_flavus_198 Aspergillus_terreus_166 XP_001276694.1 + conserved hypothetical protein [Aspergillus fumigatus NRRL 1] XP_001276684.1 pectate lyase, putative [Aspergillus clavatus NRRL 1] + XP_001276726.1 tyrosinase central domain protein [Aspergillus flavu +s NRRL 1] XP_001276738.1 endoglucanase, putative [Aspergillus terr +eus NRRL 1] XP_001276694.1 conserved hypothetical protein [Aspergi +llus fumigatus NRRL 1] Aspergillus_clavatus_3 Aspergillus_flavus_198 Aspergillus_terreu +_166 Aspergillus_fumigatus_16

Thanks in advance for your help

Best

I'd appreciate your counsel on how to rename the title of the post conveniently, because I do not think it is illustrative enough in its current form...

*Update*

As suggested by BrowserUk, I found out that after a few records, names may overlap e.g.(Aspergillus_fumigatus_1 overlaps Aspergillus_fumigatus_10 or Aspergillus_fumigatus_17).

So I guess that is the main source of error during translation.


In reply to Replace table values from text database by Alfumao

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.