Perhaps it'd be better to invest into a somewhat more generic parser, something like this:

my (@records,$cur); while(<>) { chomp; if ($_ eq "//") { push @records, $cur if defined $cur; $cur = undef; } elsif (/^(.+?) - (.+)$/) { my ($key,$value) = ($1,$2); if (defined $cur->{$key}) { if (ref $cur->{$key}) { push @{$cur->{$key}}, $value } else { $cur->{$key} = [$cur->{$key}, $value] } } else { $cur->{$key} = $value } } else { warn "didn't handle input line: $_" } } push @records, $cur if defined $cur;

Note that changing @records into a hash keyed by UNIQUE-ID is as simple as my %records = map {$_->{'UNIQUE-ID'}=>$_} @records;

Output of the above code for your example input:

$VAR1 = [ { 'ACCESSION-2' => 'ECK1895', 'LEFT-END-POSITION' => '1978212', 'RIGHT-END-POSITION' => '1979636', 'UNIQUE-ID' => 'EG11751', 'LAST-UPDATE' => '3609256889', 'KNOCKOUT-GROWTH-OBSERVATIONS' => [ 'OBS0-40', 'OBS0-37', 'OBS0-33', 'OBS0-49', 'OBS0-44' ], 'COMMENT-INTERNAL' => '1/24/05 keseler removed pexA as syn +onym', 'COMMON-NAME' => 'otsA', 'DBLINKS' => [ '(ECOLIHUB "otsA" NIL |kr| 3474243543 NIL N +IL)', '(REGULONDB "EG11751" NIL |kr| 3462030648 N +IL NIL)', '(ASAP "ABE-0006318" NIL |paley| 3398447608 + NIL NIL)', '(ECHOBASE "EB1701" NIL |pkarp| 3346767936 +NIL NIL)', '(ECOGENE "EG11751" NIL |pick| 3292798423 N +IL NIL)', '(OU-MICROARRAY "b1896" NIL NIL NIL NIL NIL +)', '(CGSC "18073" NIL |pkarp| 3035559680 NIL N +IL)' ], 'PRODUCT' => 'TREHALOSE6PSYN-MONOMER', 'ACCESSION-1' => 'b1896', 'CENTISOME-POSITION' => '42.636864 ', 'TRANSCRIPTION-DIRECTION' => '-', 'TYPES' => [ 'BC-5.5.2', 'BC-1.7.9', 'BC-5.5.1' ], 'MEMBER-SORT-FN' => 'NUMBERED-CLASS-SORT-FN', 'COMPONENT-OF' => [ 'COLI-K12-39', 'TU0-7722', 'TU00391', 'TU00312' ] } ];

In reply to Re: file parsing by Anonymous Monk
in thread file parsing by AWallBuilder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.