hi vinoth.ree
hello again
can u help once again
I was using code u hv written and it was working fine. but earlier my file has redundancy so output, as u can imagine from pattern matching, was huge.
so before pattern match i thought to remove redundancy from both the input files.
but non redundant input file for pattern matching is not giving output as it should (resulting output file have multiple entries making file redundant ad bulky again). my files

file 1 LOC_Os01g01010.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01019.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01030.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01040.4 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01040.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01040.3 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01040.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01050.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01050.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01060.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01070.3 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01070.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01070.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01080.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01080.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01080.3 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01090.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01100.1 : PS00022 EGF_1 EGF-like domain signature 1. file 2 LOC_Os01g01010.1 3017 : uORF [3,233] : ATG AGCTGGTGGGGATGCTCTAAGAGAACG +AGAGAAGCACAGAGCAGATAAACCACACCCACAGGCACCACCGTCCTTGTTGGTAATGAAGAAGACGAG +ACGACGACTTCCCCACTAGGAAACACGACGGAGGCGGAGATGATCGACGGCGGAGAGAGCTACAGAAAC +ATCGATGCCTCCTGTCCAATCCCCCCATCCCATTCGGTAGTTGGATTGAAGACTACCGAA TAA LOC_Os01g01010.2 2218 : uORF [7,129] : ATG AAGAAGACGAGACGACGACTTCCCCAC +TAGGAAACACGACGGAGGCGGAGATGATCGACGGCGGAGAGAGCTACAGAAACATCGATGCCTCCTGTC +CAATCCCCCCATCCCATTCGG TAG LOC_Os01g01019.1 1127 : CPE [1010,1127] : TTTTTAAT TTTTCGATAGCCAAATATT +AACTATTTAGCGACTTTATTGTCTGGTGTCCGAAGAAGAATATATGTAAATGACATTACCAT AATAAA + TGTTGAATGCTTCATCAAATTTT LOC_Os01g01030.1 2464 : IRES [2366,2464] : TAACT GAATTA GTATTC TA AGAA +T ATGTC AGTTT ACAAT CTTA ATTCT TAA GAAAGT CTAAA AGTCG TGC ATGTGC GTTC +CGA GCACAC ACTTTTTCGT LOC_Os01g01040.4 1524 : IRES [1436,1524] : AACTA CATT GTGGAG AT TAGCAA + CGAAAAT GTGCTA GGCCC AGGT GAGCT T TTCTAG TGATT GT TGATA CCTACATA AG +TCA TCTTTCC LOC_Os01g01040.1 2508 : IRES [2418,2508] : TGTTG TTGTT GACTA T GTGGT A +CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA +TTGTG TCAATTA LOC_Os01g01040.3 2583 : IRES [2493,2583] : TGTTG TTGTT GACTA T GTGGT A +CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA +TTGTG TCAATTA LOC_Os01g01040.2 2482 : IRES [2392,2482] : TGTTG TTGTT GACTA T GTGGT A +CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA +TTGTG TCAATTA LOC_Os01g01050.2 1996 : IRES [1911,1996] : GTTGG TCTCA TTTTCG TT TGCTG + CTGGTTAC TTGTA TTAAT ACATT ATAGA AAA TGAGTA CA TAAAT AT ACATG ACGA T +ATGA TCC LOC_Os01g01050.1 2039 : IRES [1954,2039] : GTTGG TCTCA TTTTCG TT TGCTG + CTGGTTAC TTGTA TTAAT ACATT ATAGA AAA TGAGTA CA TAAAT AT ACATG ACGA T +ATGA TCC LOC_Os01g01060.1 920 : K-BOX [778,785] : CTGTGATT LOC_Os01g01070.3 1369 : uORF [19,87] : ATG CGAACGAGCACCGGATCCGCTGCGGCT +GCTCGGCGTCGGGTCGGAGGTGAGGTCTCGAAACCC TAG LOC_Os01g01070.1 1568 : IRES [1465,1568] : AGCAAG TTTGTT TGGGG AG GATG +TACT GGAATAAG GGTATAGT AGTAGTA GGAAT TATTATG GCAC ATTTG CATGCT TT GG +CATA TGGCACTC TGAGTT TTATT LOC_Os01g01070.2 1562 : IRES [1459,1562] : AGCAAG TTTGTT TGGGG AG GATG +TACT GGAATAAG GGTATAGT AGTAGTA GGAAT TATTATG GCAC ATTTG CATGCT TT GG +CATA I only want to match the pattern "(LOC_Os0[1-7]g[0-9]*.[0-9])\s"

In reply to Re^5: match pattern from two different file by Anonymous Monk
in thread match pattern from two different file by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.