redhotpenguin has asked for the wisdom of the Perl Monks concerning the following question:

Morning Monks, I have several xml documents which I need to capitalize the first letter of each element. Rather than going through the laborious process of creating an xml parser, parsing and ucfirst'ing each element, and writing a new document I am thinking a better approach would be to create a regex which uppercases the first letter which follows '<' and '</'. How can I embed a function in a regular expression to accomplish this? Here is what I have so far as a testbed for the '<' matches:

$string =~ s!\<(.)!\<uc($1)!g;

i.e. I want to turn <foo><bar>blitz</bar></foo> into <Foo><Bar>blitz</Bar></Foo>

TIA

Replies are listed 'Best First'.
Re: Ucfirst on xml elements
by davido (Cardinal) on Aug 29, 2004 at 16:38 UTC

    This will uc the first letter following < and </:

    $string =~ s!(</?)([^/>])!$1.uc($2)!eg;

    The key is the /e modifier on the regexp, which indicates that the right-hand side of the substitution regexp is Perl code which needs to be evaluated, rather than just a plain string.

    Breaking it down into its components:

    s! (</?) # Match and capture '<' followed by optional '/' ([^/>]) # Match and capture the next char, as long as # it's not a '>' or '/'. ! $1 # Replace with '<' or '</'. . uc($2) # Concatenate with the uc of $2. !egx

    This approach is error-prone though. What if the literal text (not found inside a tag) has a '<' character int it? You don't have to write your own parser, when CPAN has many already written.

    Updated: Fixed issue with '</' case.


    Dave

      Let me note that

      s!(</?)([^/>])!$1\U$2!g;
      is a handy shortcut for
      s!(</?)([^/>])!$1.uc($2)!eg;
      That is wonderful, working great. Thank you. The data I am dealing with fortunately has no '<' characters in the literal text. And I have a couple of working parsers from CPAN implemented but for this simple task the regex is a better tool for the job I think.