Dear all,

I'm tackling a problem for which I could manually hard-code the result, but I'm very aware that I can achieve the same results with a regexp string, which would be more useful...

I have a list of chemical formulas, a sample of which is listed below, I simply want to extract, for each formula, the elements within.

This is what I've got for my one_liner:
perl -ne 'chomp;split /\s+/,$_;print $_[1],"\n";while($_[1] =~ /([A-Z +][a-z]?(\d*))/g){print "\t",$1,"\t",$2,"\n";}'
My question is, is this the only way I could get through the variable number of groups? I feel like I could write it into the regular expression itself, that the variable number of groups get directly inserted into an array or a hash, and I can exclude the while loop...is this possible?

Another question would be, if there is only one atom of an element, then there wouldn't be any output for the second group, but can I convert that empty output into a zero string, "inline"?

Thanks
Sam
__DATA__ CH4N2O C9H12N2O6 C5H11NO2 C5H4N4O2 C10H11N4O9P C10H12N4O6 C5H10O5 C5H12O5 C5H10O5 C27H44O C1694H2993O101

In reply to Recursive capture of a variable number of elements using regexp by seaver

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.