Your question has been answered well, but since my brain is in parsing mode, I thought I'd bring this up.. The reason it's hard to do this with a regex is that (barring (??{code}) directives in the regex) you can't match arbitrarily deep nested parentheses using regular expressions. Since your data format supports nesting things in parentheses, parsing is a better solution.

Writing a parser for such simple notation is not that hard. What you can do to make it even easier is to combine the weight calculations with actual parsing. This is called syntax-directed evaluation. You don't see syntax-directed evaluation much in the parsing of programming languages, but for simpler expression languages where each part of the expression has a value, and you are parsing the expression for the sole purpose of computing its final value (think of a simple math expression calculator).

use Parse::RecDescent; use List::Util 'sum'; use vars '%weights'; %weights = qw( C 12 O 16 Pb 207 ); my $g = Parse::RecDescent->new(<<'END_GRAMMAR'); weight: compound { $item[1] } compound: group(s) { ::sum( @{$item[1]} ) } group: element /\d+/ { $item[1] * $item[2] } | element { $item[1] } element: /[A-Z][a-z]*/ { $::weights{ $item[1] } } | "(" compound")" { $item[2] } END_GRAMMAR print $g->weight("Pb(CO3)2"), $/; # prints 327
This is probably what those other CPAN modules are doing. Actually, since they do more than just compute the weight, they probably parse the chemical formula into a tree structure first, and do the weight calculation on that tree. If you only do the weights, you can save yourself having to use an awkward intermediate tree representation.

blokhead


In reply to Re: Regular Expressions and atomic weights by blokhead
in thread Regular Expressions and atomic weights by hokie

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.