McCaslin has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks, I am *extremely* new to perl, and attempting to use it for a bioinformatics problem. I'm trying to process some bioinformatics data-- I have a large chunk of text which includes information about a group of genes. All of the gene names are in the wrong nomenclature. I want to keep my text file intact, but replace each instance of the incorrect nomenclature with the corresponding correct name. I've put together a hash with the incorrect(current) name for each gene as keys, and the correct(new) name as the values. I figured that I could make my text file into an array, and use some sort of regular expression to find matches to keys in the hash, and replace them with the respective value. I'm not sure how to do this part, and it's a little beyond the scope of the perl manuals I've been using. This is what I have so far:

use warnings; use strict; my $infile1 = "InParanoid_Modified.txt"; my $infile2 = "IPArray.txt"; my $outfile = "IPEnsembl.txt"; open (IN1, $infile1) or die $!; open (IN2, $infile2) or die $!; open (OUT, ">$outfile") or die $!; my %Ensembl while ($infile2 = <IN1>){ chomp ($infile); my @values = split " ", $infile; my $ENSG = shift @values; foreach my $ENSP(@values){ $Ensembl{$ENSG} = $ENSP; } } while ($infile1 = <IN1>){ chomp ($infile1); my $IPReplace = translate($infile1); print OUT $IPReplace; } sub translate { my close (IN1); close (IN2);

any help would be greatly appreciated!

Replies are listed 'Best First'.
Re: using a hash as a sort of dictionary
by ikegami (Patriarch) on Jul 26, 2010 at 19:18 UTC
    my $pat = join '|', map quotemeta, keys(%Ensembl); s/($pat)/$Ensembl{$1}/g;
Re: using a hash as a sort of dictionary
by BioLion (Curate) on Jul 26, 2010 at 19:28 UTC

    Alternately you can avoid Perl altogether - there are several online tools to do exactly this, convert gene names between the various naming schemes: DAVID or IDConverter are just a few!

    Hope this helps...

    Just a something something...
Re: using a hash as a sort of dictionary
by aquarium (Curate) on Jul 27, 2010 at 04:13 UTC
    There are some ready bio-informatics tools and even perl modules.
    However, this kind of problem of multiple replaces in text is quite easy to achieve with the Mark feature of tcl/tk text widgets, which should hopefully be available in the tcl/tk perl module. The Marks are like bookmarks within text, and they automatically move as you insert/replace text.
    the hardest line to type correctly is: stty erase ^H