Knowing the record separator we can do a little "better":

use strict; use warnings; my @records = {}; $/ = "\n//"; while (defined(my $rec = <DATA>)) { my %fields = $rec =~ /^(?:(?! {10}) *(\S{1,10}))? (.*?(?=\n(?! {1 +0})|\Z))/gms; $fields{$_} = [map {s/^\s*//; $_} split "\n", $fields{$_}] for key +s %fields; push @records, \%fields; } for my $record (@records) { print "$_:\n", map{" $_\n"} @{$record->{$_}} for sort keys %$rec +ord; print "\n\n"; } __DATA__ LOCUS NM_001098210 DEFINITION Homo sapiens catenin ACCESSION NM_001098210 VERSION NM_001098210.1 KEYWORDS RefSeq. SOURCE Homo sapiens (human) ORGANISM Homo sapiens CDS 269..2614 /gene="CTNNB2" // LOCUS NM_001098209 3415 bp mRNA linear PRI 27 +-APR-2014 DEFINITION Homo sapiens catenin (cadherin-associated protein), beta 1 +, 88kDa (CTNNB1), transcript variant 2, mRNA. ACCESSION NM_001098209 XM_001133660 XM_001133664 XM_001133673 XM_001 +133675 VERSION NM_001098209.1 GI:148233337 KEYWORDS RefSeq. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutele +ostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhin +i; Catarrhini; Hominidae; Homo. CDS 269..2614 /gene="CTNNB1" /gene_synonym="armadillo; CTNNB; MRD19" /codon_start=1 /product="catenin beta-1" /protein_id="NP_001091679.1" /db_xref="GI:148233338" /db_xref="CCDS:CCDS2694.1" /db_xref="GeneID:1499" /db_xref="HGNC:HGNC:2514" /db_xref="MIM:116806" /translation="MATQADLMELDMAMEPDRKAAVSHWQQQSYLDSGI +HSGATTTAP SLSGKGNPEEEDVDTSQVLYEWEQGFSQSFTQEQVADIDGQYAMTRAQR +VRAAMFPET LDEGMQIPSTQFDAAHPTNVQRLAEPSQMLKHAVVNLINYQDDAELATR +AIPELTKLL //

Prints:

ACCESSION: NM_001098210 CDS: 269..2614 /gene="CTNNB2" DEFINITION: Homo sapiens catenin KEYWORDS: RefSeq. LOCUS: NM_001098210 ORGANISM: Homo sapiens SOURCE: Homo sapiens (human) VERSION: NM_001098210.1 ACCESSION: NM_001098209 XM_001133660 XM_001133664 XM_001133673 XM_001133675 CDS: 269..2614 /gene="CTNNB1" /gene_synonym="armadillo; CTNNB; MRD19" /codon_start=1 /product="catenin beta-1" /protein_id="NP_001091679.1" /db_xref="GI:148233338" /db_xref="CCDS:CCDS2694.1" /db_xref="GeneID:1499" /db_xref="HGNC:HGNC:2514" /db_xref="MIM:116806" /translation="MATQADLMELDMAMEPDRKAAVSHWQQQSYLDSGIHSGATTTAP SLSGKGNPEEEDVDTSQVLYEWEQGFSQSFTQEQVADIDGQYAMTRAQRVRAAMFPET LDEGMQIPSTQFDAAHPTNVQRLAEPSQMLKHAVVNLINYQDDAELATRAIPELTKLL DEFINITION: Homo sapiens catenin (cadherin-associated protein), beta 1, 88kDa (CTNNB1), transcript variant 2, mRNA. KEYWORDS: RefSeq. LOCUS: NM_001098209 3415 bp mRNA linear PRI 27-APR-2014 ORGANISM: Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. SOURCE: Homo sapiens (human) VERSION: NM_001098209.1 GI:148233337
Perl is the programming world's equivalent of English

In reply to Re^3: multiline in while loop and regular expression by GrandFather
in thread multiline in while loop and regular expression by newtoperlprog

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.