in reply to Re: Replace table values from text database
in thread Replace table values from text database

How can I upload a sample of the test files to this thread?
  • Comment on Re^2: Replace table values from text database

Replies are listed 'Best First'.
Re^3: Replace table values from text database
by 1nickt (Canon) on Mar 14, 2016 at 14:29 UTC

      I hope this helps

      #Database of table names and new names Aspergillus_clavatus_1 XP_001276684.1 pectate lyase, putative [Aspe +rgillus clavatus NRRL 1] Aspergillus_fumigatus_2 XP_001276694.1 conserved hypothetical prote +in [Aspergillus fumigatus NRRL 1] Aspergillus_flavus_3 XP_001276726.1 tyrosinase central domain prote +in [Aspergillus flavus NRRL 1] Aspergillus_terreus_4 XP_001276738.1 endoglucanase, putative [Asper +gillus terreus NRRL 1] #Lines of the table to be renamed Aspergillus_clavatus_1 Aspergillus_flavus_198 Aspergillus_terreu +s_166 Aspergillus_fumigatus_2 Aspergillus_clavatus_1 Aspergillus_flavus_3 Aspergillus_terreus_ +4 Aspergillus_fumigatus_2 Aspergillus_clavatus_3 Aspergillus_flavus_198 Aspergillus_terreu +_166 Aspergillus_fumigatus_16 #Expected result (See that in some cases there's no replacement to be +done, if the ID is not present in the names "database" file XP_001276684.1 pectate lyase, putative [Aspergillus clavatus NRRL 1] + Aspergillus_flavus_198 Aspergillus_terreus_166 XP_001276694.1 + conserved hypothetical protein [Aspergillus fumigatus NRRL 1] XP_001276684.1 pectate lyase, putative [Aspergillus clavatus NRRL 1] + XP_001276726.1 tyrosinase central domain protein [Aspergillus flavu +s NRRL 1] XP_001276738.1 endoglucanase, putative [Aspergillus terr +eus NRRL 1] XP_001276694.1 conserved hypothetical protein [Aspergi +llus fumigatus NRRL 1] Aspergillus_clavatus_3 Aspergillus_flavus_198 Aspergillus_terreu +_166 Aspergillus_fumigatus_16

        Well, I'm still not sure that I really grasp what it is you're trying to do, but maybe this will be of some help. Changes the lines in the second file if the first field in the line is contained as the first field in a line in the first file (combining data from the two files), or does nothing if not:

        #!/usr/bin/perl use strict; use warnings; use feature 'say'; use Data::Dumper; my @data1 = ( ['Aspergillus_clavatus_1', 'XP_001276684.1', 'pectate lyase, puta +tive [Aspergillus clavatus NRRL 1]'], ['Aspergillus_fumigatus_2', 'XP_001276694.1', 'conserved hypotheti +cal protein [Aspergillus fumigatus NRRL 1]'], ['Aspergillus_flavus_3', 'XP_001276726.1', 'tyrosinase central +domain protein [Aspergillus flavus NRRL 1]'], ['Aspergillus_terreus_4', 'XP_001276738.1', 'endoglucanase, puta +tive [Aspergillus terreus NRRL 1]'] ); my %data1; for my $record ( @data1 ) { my ( $old, $new, $text ) = @{ $record }; $data1{ $old } = { new => $new, text => $text }; } my @data2 = ( ['Aspergillus_clavatus_1', 'Aspergillus_flavus_198', 'Aspergillus_ +terreus_166', 'Aspergillus_fumigatus_2'], ['Aspergillus_clavatus_1', 'Aspergillus_flavus_3', 'Aspergillus_ +terreus_4', 'Aspergillus_fumigatus_2'], ['Aspergillus_clavatus_3', 'Aspergillus_flavus_198', 'Aspergillus_ +terreu_166', 'Aspergillus_fumigatus_16'] ); my @results; for my $row ( @data2 ) { my $lookup = shift @{ $row }; push @results, exists $data1{ $lookup } ? join "\t", $data1{ $lookup }->{'new'}, $data1{ $lookup }->{'te +xt'}, join "\t", @{ $row } : join "\t", $lookup, join "\t", @{ $row }; } say Dumper \@results; __END__
        Output:
        $VAR1 = [ 'XP_001276684.1 pectate lyase, putative [Aspergillus clav +atus NRRL 1] Aspergillus_flavus_198 Aspergillus_terreus_166 +Aspergillus_fumigatus_2', 'XP_001276684.1 pectate lyase, putative [Aspergillus clav +atus NRRL 1] Aspergillus_flavus_3 Aspergillus_terreus_4 Aspe +rgillus_fumigatus_2', 'Aspergillus_clavatus_3 Aspergillus_flavus_198 Aspergi +llus_terreu_166 Aspergillus_fumigatus_16' ];

        Hope this helps!


        The way forward always starts with a minimal test.