comment on

Hello Monks,

I'm trying to solve a very annoying problem I've been having lately.

The thing is that I have a file containing a tab separated "table" with thousands of rows and dozens of columns, and another tab separated text file, where the names in the table are stored correlatively to their, let's say, "translation". e.g:

Table_name1 New_name1

Table_name2 New_name2

...

Table_name3 New_name3

The table has the following format:

Table_name1 Table_name43 Table_name17 Table_name1245 ...

Table_name2 Table_name4 Table_name37 Table_name125 ...

Table_name3 Table_name51 Table_name69 Table_name342 ...

...

Where any name can appear at any position (and all the names in the table are present in the text file to be "translated".

I tried the following script, that calls a one-liner in order to perform the edition on the table file, replacing each entry from the text file, if it is found.

My problem is that the replacements are not done correctly and I end up with a table full of replicated values that don't correspond to the original. I think this may be som issue with the for loop that contains the call to the one-liner, but I can't seem to be able to fix it.

Maybe the code is easier to understand than the intended explanation of the issue...so here it is:

#!/usr/bin/perl -w
    use strict;
    use Getopt::Long;
     
    #usage example: perl GetbackIDs.pl -p /path_to_files -e [table fil
+e extension]
    #requires a table file and a "IDs database" in ".txt" format that 
+share their name
    my ($path, $ext);
    GetOptions(
        'path=s'      => \$path,
        'extension=s' => \$ext,
        );
       
    print "$path\n";
    chdir $path or die "ERROR: Unable to enter $path: $!\n";
    opendir (TEMP , ".");
    my @files = readdir (TEMP);
    closedir TEMP;
    print "@files\n";
     
    my $name;
    my @db;
    for my $file (@files) {
        if($file=~/(\w+).$ext/){
            $name = "$1";
            print"This is the Filename: $file\n";
            open (INFILE, "$file") || die ("cannot open input file");
            chomp(my @data = <INFILE>);
            my$file2= "$name.bd";
            print"This is the DBname:$file2\n";
            open (DB, "$file2") || die ("cannot open input file");
            chomp(@db = <DB>);    
        }    
    #Edition "on the fly" via One-Liner
    for(@db){
            my ($dbid,$firstid) = split(/\t/, $_);
            chomp $firstid;
            print"This is my $dbid and its $firstid\n";
            ##ONELINER #if id matches, replace id
            my$susti=`perl -pi -e 's/$dbid/$firstid/g' $name.$ext`;
            }
        }
[download]

Examples of data

#Database of table names and new names

Aspergillus_clavatus_1    XP_001276684.1 pectate lyase, putative [Aspe
+rgillus clavatus NRRL 1]
Aspergillus_fumigatus_2    XP_001276694.1 conserved hypothetical prote
+in [Aspergillus fumigatus NRRL 1]
Aspergillus_flavus_3    XP_001276726.1 tyrosinase central domain prote
+in [Aspergillus flavus NRRL 1]
Aspergillus_terreus_4    XP_001276738.1 endoglucanase, putative [Asper
+gillus terreus NRRL 1]

#Lines of the table to be renamed

Aspergillus_clavatus_1    Aspergillus_flavus_198    Aspergillus_terreu
+s_166    Aspergillus_fumigatus_2    
Aspergillus_clavatus_1    Aspergillus_flavus_3    Aspergillus_terreus_
+4    Aspergillus_fumigatus_2
Aspergillus_clavatus_3    Aspergillus_flavus_198    Aspergillus_terreu
+_166    Aspergillus_fumigatus_16


#Expected result (See that in some cases there's no replacement to be 
+done, if the ID is not present in the names "database" file

XP_001276684.1 pectate lyase, putative [Aspergillus clavatus NRRL 1]  
+  Aspergillus_flavus_198    Aspergillus_terreus_166    XP_001276694.1
+ conserved hypothetical protein [Aspergillus fumigatus NRRL 1]    
XP_001276684.1 pectate lyase, putative [Aspergillus clavatus NRRL 1]  
+  XP_001276726.1 tyrosinase central domain protein [Aspergillus flavu
+s NRRL 1]    XP_001276738.1 endoglucanase, putative [Aspergillus terr
+eus NRRL 1]    XP_001276694.1 conserved hypothetical protein [Aspergi
+llus fumigatus NRRL 1]
Aspergillus_clavatus_3    Aspergillus_flavus_198    Aspergillus_terreu
+_166    Aspergillus_fumigatus_16
[download]

Thanks in advance for your help

Best

I'd appreciate your counsel on how to rename the title of the post conveniently, because I do not think it is illustrative enough in its current form...

*Update*

As suggested by BrowserUk, I found out that after a few records, names may overlap e.g.(Aspergillus_fumigatus_1 overlaps Aspergillus_fumigatus_10 or Aspergillus_fumigatus_17).

So I guess that is the main source of error during translation.

In reply to Replace table values from text database by Alfumao

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.