Hi Monkers, I am trying to extract certain ID's from a file. The file looks like this
Query= sp|P30443|1A01_HUMAN HLA class I histocompatibility antigen, A-1 alpha chain OS=Homo sapiens GN=HLA-A PE=1 SV=1 (365 letters) Score + E Sequences producing significant alignments: (bits +) Value tr|G1KTN1|G1KTN1_ANOCA Uncharacterized protein OS=Anolis carolin... +242 1e-77 tr|L7MZX2|L7MZX2_ANOCA Uncharacterized protein OS=Anolis carolin... +239 2e-76 tr|H9GR57|H9GR57_ANOCA Uncharacterized protein (Fragment) OS=Ano... +236 4e-75 tr|L7MZP5|L7MZP5_ANOCA Uncharacterized protein OS=Anolis carolin... +233 3e-74 tr|H9G3Y5|H9G3Y5_ANOCA Uncharacterized protein OS=Anolis carolin... +231 1e-73 tr|H9GBT0|H9GBT0_ANOCA Uncharacterized protein (Fragment) OS=Ano... +232 2e-73 tr|H9GTB3|H9GTB3_ANOCA Uncharacterized protein (Fragment) OS=Ano... +220 3e-69 tr|H9GSQ9|H9GSQ9_ANOCA Uncharacterized protein OS=Anolis carolin... +218 2e-68 tr|L7MZR7|L7MZR7_ANOCA Uncharacterized protein (Fragment) OS=Ano... +213 4e-66 tr|H9GRY4|H9GRY4_ANOCA Uncharacterized protein (Fragment) OS=Ano... +209 2e-65 tr|H9GBL3|H9GBL3_ANOCA Uncharacterized protein OS=Anolis carolin... +206 5e-64 >tr|G1KTN1|G1KTN1_ANOCA Uncharacterized protein OS=Anolis carolinensis PE=3 SV=2 Length = 358 Score = 242 bits (618), Expect = 1e-77, Method: Composition-based +stats. Identities = 131/280 (46%), Positives = 175/280 (62%), Gaps = 8/280 ( +2%) Query: 24 AGSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQKMEPRAPWI---EQE +G 80 + SHSMRYF TSVS PG+ P+F VGYVDD +FV ++ A++++ P+ PWI E+ + Sbjct: 25 SSSHSMRYFVTSVSEPGQQVPQFSYVGYVDDQEFVSYN--ASTRRYLPKVPWISKVEKN +D 82 Query: 81 PEYWDQETRNMKAHSQTDRANLGTLRGYYNQSEDGSHTIQIMYGCDVGPDGRFLRGYRQ +D 140 P+YW++ T + H ++ R +L TL YYNQS G HT Q MYGC++ D GY Q + Sbjct: 83 PDYWERNTLYAQGHERSFRDHLATLAEYYNQS-GGLHTFQWMYGCELRNDWS-KGGYYQ +Y 14 >tr|L7MZX2|L7MZX2_ANOCA Uncharacterized protein OS=Anolis carolinensis GN=LOC100559978 PE=3 SV=1 Length = 364 Score = 239 bits (611), Expect = 2e-76, Method: Composition-based +stats. Identities = 130/274 (47%), Positives = 176/274 (64%), Gaps = 8/274 ( +2%) Query: 30 RYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQKMEPRAPWI---EQEGPEYWD +Q 86 RY +TSVS PG+ EP+F +VGYVD+ +FV +DS A ++ P PWI E+E PEYW+ +Q Sbjct: 32 RYVYTSVSEPGQQEPQFFSVGYVDEQEFVSYDSKA--KRRFPAVPWIRKVEEEDPEYWE +Q 89
I would like to extract all the alpha numeric characters after "Query=" and the first ID that comes after the ">". The regex works fine for individual extraction of the ID after "Query=" and after ">". But I want to print the "Query=" ID and print only the first ID that comes after ">". Then the program should find the next "Query=" and so on. My code works fine for the first regex but nothing is printed in the second regex. This question may be silly, but im relatively new to perl. My code below
#!/usr/bin/perl use warnings; use strict; use diagnostics; my $file=$ARGV[0]; open (FILE,$file); while(<FILE>) { my @query=$_; foreach my $a (@query) { next until $a=~/^Query=.*$/; if($a=~/^Query=\s([^\s]+)\s.*$/) { print "$1\t"; next until $a=~/^>.+$/; if($a=~/^>([^\s]+)\s.*$/) { print "$1\n"; } } } }

In reply to Help with regex by rocketperl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.