in reply to Re: Re: Regexps for Parsing Brackets in Chemical Formulae
in thread Regexps for Parsing Brackets in Chemical Formulae

You were close. That should do it:
use strict; my %count; # added gratuitous parentheses for embedded formula testing sake. $_='Mo(P(H)3)4(CO)(NH2C2(H)5)'; # at each iteration do subformula with rigtmost left parenthesis. # quit when no more parenthesis s/(.*)\((.*?)\)(\d*)/$1 . $2 x ($3 ? $3 : 1) /e while m/\(/; s/([A-Z](?:[a-z])?)(\d*)/ $count{$1} += $2 ? $2 : 1 ;''/eg; printf "%-2s %3d\n", $_, $count{$_} for sort keys %count;
It prints:
C 3 H 19 Mo 1 N 1 O 1 P 4

-- stefp

Replies are listed 'Best First'.
Re: Re: Re: Re: Regexps for Parsing Brackets in Chemical Formulae
by Elgon (Curate) on Nov 05, 2001 at 09:26 UTC

    Stefp,

    Muchas gracias - one minor alteration to take account of the fact that certain artificial elements have, under certain nomenclatures, three letters rather than one or two...

    s/([A-Z](?:[a-z]{0,2})?)(\d*)/  $count{$1} += $2 ? $2  : 1 ;''/eg;

    Otherwise, perfect!

    Ta, Elgon.

    "Without evil there can be no good, so it must be good to be evil sometimes.
    --Satan, South Park: Bigger, Longer, Uncut.

      Not quite perfect:

         s/([A-Z][a-z]{0,2})(\d*)/ $count{$1} += $2 ? $2 : 1 '' /eg;

      is cleaner. The (?:) was a unneeded left-off in my code and when you added the {0,2} modifier, the ? modifier became redundant. Or {2}? could be used instead of {0,2}.

      Strangely for the golfers {,2} is not supported; it should be expected to be supported because {2,} is.

      -- stefp