in reply to Recursive capture of a variable number of elements using regexp

When used in list context, a regex will return a list of matching expressions - see perlre. Since you have two terms to match here (element and abundance), you could store the results straight to a hash. Consider the following:

$_ = 'CH4N2O'; print $_,"\n"; %hash = /([A-Z][a-z]?)(\d*)/g; while ( ($key, $value) = each %hash) { $value ||= 1; print "\t$key\t$value\n"; }

Note I also changed your grouping a little.

Update:If your chemical formulas encode structural information (e.g. HOH for water), then keys in a hash will get clobbered. You can, of course, substitute an array for the hash, and compensate appropriately. Thanks jwkrahn for reminding me to include a warning.

  • Comment on Re: Recursive capture of a variable number of elements using regexp
  • Download Code

Replies are listed 'Best First'.
Re^2: Recursive capture of a variable number of elements using regexp
by jwkrahn (Abbot) on Apr 09, 2009 at 18:31 UTC

    If I remember my chemistry correctly, and I probably don't, but can't the same element appear more than once in a formula, and if so then using a hash would elide some of the elements?

      As long as the formulas are sum formulas, there should be no problem, as each element should be summed up and shouldn't appear again in the same formula. So it should be fine with C2H6O.

      If the formula tries to represent some kind of molecular structure, you may be right: CH3CH2OH

      (Both formulas represent ethanol).

      What linuxer said. Given the sample data, I was assuming the OP was just interested in Hill Order formulas. Technically speaking though, you are correct, and I will admonish appropriately.
Re^2: Recursive capture of a variable number of elements using regexp
by seaver (Pilgrim) on Apr 09, 2009 at 18:08 UTC
    Thanks kennethk!