AWallBuilder has asked for the wisdom of the Perl Monks concerning the following question:
Dear all, This is related to an earlier post, and I have made progress, but I am stuck again. I am parsing a file (partial example below), and reading into a hash %values. However, I want to extract info from the Ecogene line (eg. DBLINKS - (ECOGENE)). I think my %values hash will not have separate vale/record pairs for the 5 different DBLINKS. Is there anyway I can just extract the one for ECOGENE. I tried one way in my code, but it is wrong as I realize the problem is in reading into the hash. Any help appreciated.
portion of input file
// UNIQUE-ID - EG11751 TYPES - BC-5.5.2 TYPES - BC-1.7.9 TYPES - BC-5.5.1 COMMON-NAME - otsA ACCESSION-1 - b1896 ACCESSION-2 - ECK1895 CENTISOME-POSITION - 42.636864 COMMENT-INTERNAL - 1/24/05 keseler removed pexA as synonym COMPONENT-OF - COLI-K12-39 COMPONENT-OF - TU0-7722 COMPONENT-OF - TU00391 COMPONENT-OF - TU00312 DBLINKS - (ECOLIHUB "otsA" NIL |kr| 3474243543 NIL NIL) DBLINKS - (REGULONDB "EG11751" NIL |kr| 3462030648 NIL NIL) DBLINKS - (ASAP "ABE-0006318" NIL |paley| 3398447608 NIL NIL) DBLINKS - (ECHOBASE "EB1701" NIL |pkarp| 3346767936 NIL NIL) DBLINKS - (ECOGENE "EG11751" NIL |pick| 3292798423 NIL NIL) DBLINKS - (OU-MICROARRAY "b1896" NIL NIL NIL NIL NIL) DBLINKS - (CGSC "18073" NIL |pkarp| 3035559680 NIL NIL) KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-40 KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-37 KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-33 KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-49 KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-44 LAST-UPDATE - 3609256889 LEFT-END-POSITION - 1978212 MEMBER-SORT-FN - NUMBERED-CLASS-SORT-FN PRODUCT - TREHALOSE6PSYN-MONOMER RIGHT-END-POSITION - 1979636 TRANSCRIPTION-DIRECTION - - //
code
use strict; use warnings; use Data::Dumper; ## my $inGeneDat=$ARGV[0] || "genes.dat"; open(IN,"<",$inGeneDat) || die "cannot open $inGeneDat\n"; ## my %HoNms; { local $/ = '//'; while(my $record=<IN>) { my %values = $record =~ /^(\S+)\s+-\s+(\S+)/mg; next unless exists $values{'UNIQUE-ID'} and exists $values{'ACCESS +ION-1'}; # Your code using $values{'UNIQUE-ID'} and other values here my $cycID=$values{'UNIQUE-ID'}; my $cycLoc=$values{'ACCESSION-1'}; my $ECKLoc=$values{'ACCESSION-2'}; my $EGLocL=$values{'DBLINKS'}; $EGLocL=~/"(EG\S+)"/; my $EGLoc=$1; my $Nm=$values{'COMMON-NAME'}; # $HoNms{$cycID} = { 'acc1' => $cycLoc, 'acc2' => $ECKLoc, 'EG'=> $EGLoc, 'nm' => $Nm }; } } print Dumper(%HoNms); close(IN);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: file parsing
by RichardK (Parson) on Jan 19, 2015 at 14:09 UTC | |
|
Re: file parsing
by poj (Abbot) on Jan 19, 2015 at 14:20 UTC | |
by AWallBuilder (Beadle) on Jan 19, 2015 at 15:02 UTC | |
|
Re: file parsing
by Anonymous Monk on Jan 19, 2015 at 15:32 UTC |