Dear all, This is related to an earlier post, and I have made progress, but I am stuck again. I am parsing a file (partial example below), and reading into a hash %values. However, I want to extract info from the Ecogene line (eg. DBLINKS - (ECOGENE)). I think my %values hash will not have separate vale/record pairs for the 5 different DBLINKS. Is there anyway I can just extract the one for ECOGENE. I tried one way in my code, but it is wrong as I realize the problem is in reading into the hash. Any help appreciated.

portion of input file

// UNIQUE-ID - EG11751 TYPES - BC-5.5.2 TYPES - BC-1.7.9 TYPES - BC-5.5.1 COMMON-NAME - otsA ACCESSION-1 - b1896 ACCESSION-2 - ECK1895 CENTISOME-POSITION - 42.636864 COMMENT-INTERNAL - 1/24/05 keseler removed pexA as synonym COMPONENT-OF - COLI-K12-39 COMPONENT-OF - TU0-7722 COMPONENT-OF - TU00391 COMPONENT-OF - TU00312 DBLINKS - (ECOLIHUB "otsA" NIL |kr| 3474243543 NIL NIL) DBLINKS - (REGULONDB "EG11751" NIL |kr| 3462030648 NIL NIL) DBLINKS - (ASAP "ABE-0006318" NIL |paley| 3398447608 NIL NIL) DBLINKS - (ECHOBASE "EB1701" NIL |pkarp| 3346767936 NIL NIL) DBLINKS - (ECOGENE "EG11751" NIL |pick| 3292798423 NIL NIL) DBLINKS - (OU-MICROARRAY "b1896" NIL NIL NIL NIL NIL) DBLINKS - (CGSC "18073" NIL |pkarp| 3035559680 NIL NIL) KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-40 KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-37 KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-33 KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-49 KNOCKOUT-GROWTH-OBSERVATIONS - OBS0-44 LAST-UPDATE - 3609256889 LEFT-END-POSITION - 1978212 MEMBER-SORT-FN - NUMBERED-CLASS-SORT-FN PRODUCT - TREHALOSE6PSYN-MONOMER RIGHT-END-POSITION - 1979636 TRANSCRIPTION-DIRECTION - - //

code

use strict; use warnings; use Data::Dumper; ## my $inGeneDat=$ARGV[0] || "genes.dat"; open(IN,"<",$inGeneDat) || die "cannot open $inGeneDat\n"; ## my %HoNms; { local $/ = '//'; while(my $record=<IN>) { my %values = $record =~ /^(\S+)\s+-\s+(\S+)/mg; next unless exists $values{'UNIQUE-ID'} and exists $values{'ACCESS +ION-1'}; # Your code using $values{'UNIQUE-ID'} and other values here my $cycID=$values{'UNIQUE-ID'}; my $cycLoc=$values{'ACCESSION-1'}; my $ECKLoc=$values{'ACCESSION-2'}; my $EGLocL=$values{'DBLINKS'}; $EGLocL=~/"(EG\S+)"/; my $EGLoc=$1; my $Nm=$values{'COMMON-NAME'}; # $HoNms{$cycID} = { 'acc1' => $cycLoc, 'acc2' => $ECKLoc, 'EG'=> $EGLoc, 'nm' => $Nm }; } } print Dumper(%HoNms); close(IN);

In reply to file parsing by AWallBuilder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.