Hello all.

Having looked at earlier posts on the subject and finding nothing that worked for me (the file information was not displayed when I ran the code), I thought I would submit an inquiry.

In the following file:

LOCUS       NC_004905               1557 bp    cRNA    linear   VRL 27-JUN-2007
DEFINITION  Influenza A virus (A/Hong Kong/1073/99(H9N2)), complete genome.
ACCESSION   NC_004905
VERSION     NC_004905.2  GI:93163150
DBLINK      BioProject: PRJNA14892
KEYWORDS    RefSeq; NP gene; nucleoprotein.
SOURCE      Influenza A virus (A/Hong Kong/1073/99(H9N2))
  ORGANISM  Influenza A virus (A/Hong Kong/1073/99(H9N2))
            Viruses; ssRNA negative-strand viruses; Orthomyxoviridae;
            Influenzavirus A.
REFERENCE   1
  AUTHORS   Lin,Y.P., Shaw,M., Gregory,V., Cameron,K., Lim,W., Klimov,A.,
            Subbarao,K., Guan,Y., Krauss,S., Shortridge,K., Webster,R., Cox,N.
            and Hay,A.
  TITLE     Avian-to-human transmission of H9N2 subtype influenza A viruses:
            relationship between H9N2 and H5N1 human isolates
  JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 97 (17), 9654-9658 (2000)
   PUBMED   10920197
REFERENCE   2  (bases 1 to 1557)
  CONSRTM   NCBI Genome Project
  TITLE     Direct Submission
  JOURNAL   Submitted (21-JUN-2003) National Center for Biotechnology
            Information, NIH, Bethesda, MD 20894, USA
REFERENCE   3  (bases 1 to 1557)
  AUTHORS   Lin,Y.
  TITLE     Direct Submission
  JOURNAL   Submitted (11-JUL-2000) Lin Y., Virology, National Institute for
            Medical Research, The Ridgeway, Mill Hill, London, NW7 1AA, UNITED
            KINGDOM
REFERENCE   4  (bases 1 to 1557)
  AUTHORS   Lin,Y.
  TITLE     Direct Submission
  JOURNAL   Submitted (23-JUN-2000) Lin Y., Virology, National Institute for
            Medical Research, The Ridgeway, Mill Hill, London, NW7 1AA, UNITED
            KINGDOM
COMMENT     REVIEWED REFSEQ: This record has been curated by NCBI staff. The
            reference sequence was derived from AJ289871.
            On Apr 21, 2006 this sequence version replaced gi:32140158.
            COMPLETENESS: full length.
FEATURES             Location/Qualifiers
     source          1..1557
                     /organism="Influenza A virus (A/Hong Kong/1073/99(H9N2))"
                     /mol_type="viral cRNA"
                     /strain="A/Hong Kong/1073/99"
                     /serotype="H9N2"
                     /db_xref="taxon:130760"
                     /segment="segment 5"
                     /country="Hong Kong:Kowloon"
     gene            40..1536
                     /gene="np"
                     /locus_tag="FLUAVAHHH9N2s5gp1"
                     /db_xref="GeneID:4036076"
     CDS             40..1536
                     /gene="np"
                     /locus_tag="FLUAVAHHH9N2s5gp1"
                     /codon_start=1
                     /product="nucleoprotein"
                     /protein_id="YP_581749.1"
                     /db_xref="GI:93163151"
                     /db_xref="GOA:Q9ICY7"
                     /db_xref="InterPro:IPR002141"
                     /db_xref="UniProtKB/TrEMBL:Q9ICY7"
                     /db_xref="GeneID:4036076"
                     /translation="MASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYVQMC
                     TELKLSDQEGRLIQNSITIERMVLSAFDERRNRYLEEHPSAGKDPKKTGGPIYRRRDG
                     KWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMD
                     PRMCSLMQGSTLPRRSGAAGAAIKGVGTMVMELIRMIKRGINDRNFWRGDNGRRTRIA
                     YERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCL
                     PACVYGLAVASGYDFEREGYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACH
                     SAAFEDLRVSSFIRGTRVIPRGQLSTRGVQIASNENVEAMDSSTLELRSRYWAIRTRS
                     GGNTNQQRASAGQISVQPTFSVQRNLPFERPTIMAAFKGNTEGRTSDMRTEIIRMMES
                     ARPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGDNAEEYDN"
ORIGIN      
        1 agcagggtta ataatcactc actgagtgac atcaacatca tggcgtcgca aggcaccaaa
       61 cgatcctatg aacagatgga aactggtgga gaacgccaga atgccactga gatcagggca
      121 tctgttggaa gaatggttgg tggaattggg aggttttacg tacagatgtg cactgaactc
      181 aaactcagcg accaagaagg aaggttgatc cagaacagta taacaataga gagaatggtt
      241 ctctccgcat ttgatgaaag gaggaacagg tacctagagg aacatcccag tgcggggaag
      301 gacccgaaga agaccggagg tccaatctac cgaaggagag acgggaaatg ggtgagagag
      361 ctgattctgt atgacaaaga ggagataagg agaatttggc gtcaagcgaa caatggagaa
      421 gacgcaactg ctggtctcac tcatatgatg atctggcatt ccaacctaaa tgatgccaca
      481 taccagagaa caagagccct cgtgcggact ggaatggacc ccagaatgtg ctctctgatg
      541 caaggatcaa ccctcccgag gagatctgga gctgctggtg cagcaataaa gggagtcggg
      601 acaatggtaa tggaactaat tcggatgata aagcgaggca ttaatgaccg gaacttctgg
      661 agaggcgata atggacgaag aacaaggatt gcatatgaga gaatgtgcaa catcctcaaa
      721 gggaaatttc aaacagcagc acaaagagca atgatggatc aggtgcgaga aagcagaaat
      781 cctgggaatg ctgaaattga agatctcatc tttctggcac ggtctgcact catcctgaga
      841 ggatccgtag cccataagtc ctgcttgcct gcttgtgtgt acgggctcgc tgtggccagt
      901 ggatatgatt ttgagaggga agggtactct ctggttggga tagatccttt ccgtctgctt
      961 cagaacagtc aggtcttcag tcttattaga ccaaatgaga atccagcaca taaaagtcaa
     1021 ttggtatgga tggcatgcca ttctgcagca tttgaggacc tgagagtctc aagtttcatt
     1081 agaggaacaa gagtgatccc aagaggacaa ctatccacta gaggagttca gattgcttca
     1141 aatgagaacg tggaagcaat ggattccagc actcttgaac tgagaagcag atattgggct
     1201 ataaggacca ggagtggagg aaacaccaat caacagagag catctgcagg acaaatcagt
     1261 gtacagccca ctttctcagt acagagaaat cttcccttcg aaagaccgac cattatggct
     1321 gcgtttaagg ggaataccga gggcagaaca tctgacatga ggactgaaat cataaggatg
     1381 atggaaagtg ccagaccaga agatgtgtct ttccaggggc ggggagtctt cgagctctcg
     1441 gacgaaaagg caacgaaccc gatcgtgcct tcctttgaca tgagtaatga aggatcttat
     1501 ttcttcggag acaatgcaga ggaatatgac aattgaggaa aaataccctt gtttcta
//


I don't understand why the following regex doesn't extract the dna from the array:

#!/usr/bin/perl -w use strict; use autodie; my $dna; open FH, '<', 'fluseq.txt'; my @data = <FH>; close FH; for (@data) { s/\s+//g; s/\d+//; s/\/\///; if (/ORIGIN(.*)/ms) { print $1; } }


Thanks all.

In reply to Extracting field information from a GenBank file. by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.