myraja has asked for the wisdom of the Perl Monks concerning the following question:

(+)-catechin SMI2MOL 21 23 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 1 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 13 1 0 0 0 0 19 21 1 0 0 0 0 M END > <$NAM> (+)-catechin > <Formula> C15H14O6 > <MolWeight> 290.26806 > <ChemBankID> 1254 > <CompoundName> (+)-catechin > <Calbiochem Catalog> 219250 > <MicroSource Catalog> 210205 $$$$ (+)-himbacine SMI2MOL 25 28 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 2 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 1 0 0 0 0 0 1 2 1 0 0 0 0 2 25 1 0 0 0 0 14 25 1 0 0 0 0 M END > <$NAM> (+)-himbacine > <Formula> C22H35NO2 > <MolWeight> 345.51884 > <ChemBankID> 1861 > <CompoundName> (+)-himbacine > <Calbiochem Catalog> 377200 $$$$ (+)-methamphetamine SMI2MOL 11 11 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 1 0 0 0 0 10 11 1 0 0 0 0 M END > <$NAM> (+)-methamphetamine > <Formula> C10H15N > <MolWeight> 149.23284 > <ChemBankID> 1568 > <CompoundName> (+)-methamphetamine > <MicroSource Catalog> 1900033 $$$$ ****************
The above is a molecular structure file. I need to extract molecular name, molecular weight ,formula, compound name and other data next to all > <something.. > from the file and load it in to MYSQL using CGI. tried RE.but...no luck...Please help. Thankz in advance...

Edited by Chady -- resurrected doctext.

Replies are listed 'Best First'.
Re: parsing using metacharacters
by graff (Chancellor) on Feb 19, 2004 at 06:47 UTC
    Well, I'll guess that you already have your table in place and you know what data goes into what columns, etc... As for splitting up the data stream, something like this, maybe? (untested)
    { local $/ = "\$\$\$\$\n"; # apparent record delimiter while (<>) { my @lines = split( /\n/ ); my %fields = (); while ( @lines ) { $_ = shift @lines; if ( /^> <([^>]+)/ ) { my $fldname = $1; $fldname = "MolName" if ( $fldname eq '$NAM' ); $fields{$fldname} = shift @lines; } } # do something to put %fields into mysql... } }
    That assumes that all the values you want (following the lines that start with ">") are really all single-line values, as shown in your brief example. Note that I change the "$NAM" string into something that's less likely to get you into to trouble elsewhere. =)
Re: parsing using metacharacters
by BrowserUk (Patriarch) on Feb 19, 2004 at 06:59 UTC

    Your question is ambiguous. When you say "next to all > <something.. >", do you mean the next line like this?

    #! perl -sw use strict; m[> <(.*)>] and print "$1: ", scalar <STDIN> while <STDIN>; __END__ P:\test>perl 330148.pl < 330148.dat $NAM: (+)-catechin Formula: C15H14O6 MolWeight: 290.26806 ChemBankID: 1254 CompoundName: (+)-catechin Calbiochem Catalog: 219250 MicroSource Catalog: 210205 $NAM: (+)-himbacine Formula: C22H35NO2 MolWeight: 345.51884 ChemBankID: 1861 CompoundName: (+)-himbacine Calbiochem Catalog: 377200 $NAM: (+)-methamphetamine Formula: C10H15N MolWeight: 149.23284 ChemBankID: 1568 CompoundName: (+)-methamphetamine MicroSource Catalog: 1900033

    Or everything from the > <something> to the next?

    If the latter, then you could set $/="\n> <"; and then each read will get you an entire multi-line record.

    #! perl -sw use strict; local $/ = "\n> <"; m[([^>]+)>\n(.*)]sm and print "$1\n$2\n---------\n" while <STDIN>;

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Timing (and a little luck) are everything!
Re: parsing using metacharacters
by kvale (Monsignor) on Feb 19, 2004 at 06:49 UTC
    The following code demonstrates how to parse these sort of tags:
    #!/usr/bin/perl -w use strict; while (<DATA>) { if (/\${4}/) { print "End of record\n"; last; } next unless /^> <([^>]+)>$/; my $tag = $1; chomp( my $property = <DATA>); print "tag: $tag property: $property\n"; } __DATA__ (+)-catechin SMI2MOL 21 23 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 1 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 13 1 0 0 0 0 19 21 1 0 0 0 0 M END > <$NAM> (+)-catechin > <Formula> C15H14O6 > <MolWeight> 290.26806 > <ChemBankID> 1254 > <CompoundName> (+)-catechin > <Calbiochem Catalog> 219250 > <MicroSource Catalog> 210205 $$$$
    Once you have the tags, you can write glue code to insert the record into a database.

    -Mark

      why if (/\${4}/)
A reply falls below the community's threshold of quality. You may see it by logging in.