This question is an improved version of my previous question (as people told it was ambiguous). So I have refined the question to get a correct answer.

My problem is : To parse a chemical file and load it in to the mysql database using CGI.

The format of chemical file is as below:

---------- data ---------- &&&& - delimiter for a single chemical file ---------- data ---------- &&&& - delimiter for a single chemical file ---------- data ---------- $$$$

My full file has 100's of such structures. below is the full file with DATA (for 3 chemicals)

(+)-catechin SMI2MOL 21 23 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 1 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 13 1 0 0 0 0 19 21 1 0 0 0 0 M END > <$NAM> (+)-catechin > <Formula> C15H14O6 > <MolWeight> 290.26806 > <ChemBankID> 1254 > <CompoundName> (+)-catechin > <Calbiochem Catalog> 219250 > <MicroSource Catalog> 210205 $$$$ (+)-himbacine SMI2MOL 25 28 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 2 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 1 0 0 0 0 0 1 2 1 0 0 0 0 2 25 1 0 0 0 0 14 25 1 0 0 0 0 M END > <$NAM> (+)-himbacine > <Formula> C22H35NO2 > <MolWeight> 345.51884 > <ChemBankID> 1861 > <CompoundName> (+)-himbacine > <Calbiochem Catalog> 377200 $$$$ (+)-methamphetamine SMI2MOL 11 11 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 1 0 0 0 0 10 11 1 0 0 0 0 M END > <$NAM> (+)-methamphetamine > <Formula> C10H15N > <MolWeight> 149.23284 > <ChemBankID> 1568 > <CompoundName> (+)-methamphetamine > <MicroSource Catalog> 1900033 $$$$ #I TRUNCATED THE FILE HERE LIMITING TO 3 CHEMICALS#####

task 1. First I need to print the above file in the following format :

the formula of chemical one is : C15H14O6
the MolWeight of chemical one is : 290.26806
the ChemBankID of chemical one is : 1254

the formula of chemical two is : C22H35NO2
the MolWeight of chemical two is : 345.51884
the ChemBankID of chemical two is : 1861

and so on... for 100's of such parameters

**mostly the formula, molweight, etc is contained one line below

><something....> some data....

task 2: I want to load the formula,molweight,chembankID in to the MYSQL database using CGI.

Thats it...

I tried with arrays...but had problems when the full file size exceed several 100 mb as it used lot of memory RegEx didnt me fetch good luck

So please help me in this regard. I will be happy if you can offer full source code tested for result.

Thanks in advance...

Edited by Chady -- added paragraph tags, code tags, and a readmore tag


In reply to parsing file using metacharacters -new by myraja
in thread parsing using metacharacters by myraja

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.