Ucfirst on xml elements

redhotpenguin has asked for the wisdom of the Perl Monks concerning the following question:

Morning Monks, I have several xml documents which I need to capitalize the first letter of each element. Rather than going through the laborious process of creating an xml parser, parsing and ucfirst'ing each element, and writing a new document I am thinking a better approach would be to create a regex which uppercases the first letter which follows '<' and '</'. How can I embed a function in a regular expression to accomplish this? Here is what I have so far as a testbed for the '<' matches:

$string =~ s!\<(.)!\<uc($1)!g;

i.e. I want to turn <foo><bar>blitz</bar></foo> into <Foo><Bar>blitz</Bar></Foo>

TIA

Comment on Ucfirst on xml elements Download Code

Replies are listed 'Best First'.
Re: Ucfirst on xml elements by davido (Cardinal) on Aug 29, 2004 at 16:38 UTC
This will uc the first letter following < and </: `$string =~ s!(</?)([^/>])!$1.uc($2)!eg;` [download] The key is the /e modifier on the regexp, which indicates that the right-hand side of the substitution regexp is Perl code which needs to be evaluated, rather than just a plain string. Breaking it down into its components: `s! (</?) # Match and capture '<' followed by optional '/' ([^/>]) # Match and capture the next char, as long as # it's not a '>' or '/'. ! $1 # Replace with '<' or '</'. . uc($2) # Concatenate with the uc of $2. !egx` [download] This approach is error-prone though. What if the literal text (not found inside a tag) has a '<' character int it? You don't have to write your own parser, when CPAN has many already written. Updated: Fixed issue with '</' case. Dave	[reply] [d/l] [select]
Re^2: Ucfirst on xml elements by ambrus (Abbot) on Aug 29, 2004 at 17:57 UTC
Let me note that `s!(</?)([^/>])!$1\U$2!g;` [download] is a handy shortcut for `s!(</?)([^/>])!$1.uc($2)!eg;` [download]	[reply] [d/l] [select]
Re^2: Ucfirst on xml elements by redhotpenguin (Deacon) on Aug 29, 2004 at 16:45 UTC
That is wonderful, working great. Thank you. The data I am dealing with fortunately has no '<' characters in the literal text. And I have a couple of working parsers from CPAN implemented but for this simple task the regex is a better tool for the job I think.	[reply]