Ananda has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, here is the problem statement: A string (xml) has continuesly occuring of a pattern like

"<name>abc</name><name>def</name>"

This pattern can occur n number of times.

This pattern pattern needs to be replaced with "<names>abc,def,ghi</names>.

Can this be acheived by the search and replace method or is there another approach. Please advice . Thanks in advance.

Ananda

Replies are listed 'Best First'.
Re: replacing continuesly occuring pattern
by CountZero (Bishop) on Apr 16, 2005 at 09:34 UTC
    Several options are open:
    • Painstakingly go through your xml-file with a regex on a line-by-line basis, selecting everyting between <name></name>-tags and pushing this into an array and finally outputting the array between a <names></names> tag.

      This may work, but is considered prone to breaking if your xml-file spans the tags over multiple lines, or your tags will include other tags, or ... .

    • Much better is it to use a module like XML::Parser which will parse your xml-file and allow you to extract whatever you want from the xml-file
    • Or you could go the XSLT-way and transform your xml-file into another xml-file. Excellent XSLT-modules exist, such as Xml::Sablotron or XML-LibXSLT or use XML::XSLT::Wrapper which provides a consistent interface to various XSLT-engines.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: replacing continuesly occuring pattern
by hv (Prior) on Apr 16, 2005 at 15:39 UTC

    It's generally not considered a good idea to manipulate XML with regular expressions - dedicated XML parsing modules tend to do a better job.

    That's said, here's one way to do it:

    1 while $text =~ s{ <names?> ([^<]*) </names?> <name> ([^<]*) </name> }{<names>$1,$2</names>}gx;

    This assumes that the 'name' tags cannot contain additional XML entities. It also isn't particularly efficient, but should be fine if your strings aren't longer than a few KB or so.

    Hugo

Re: replacing continuesly occuring pattern
by Tanktalus (Canon) on Apr 16, 2005 at 16:48 UTC

    My favourite XML method ... XML::Twig! ;-)

    There is probably an easier way to do this with XML::Twig using the twig handlers, but those confuse the heck out of me, so I try to avoid them ;-) I find this method, even if a bit longer, and probably slower and more memory-consuming, to be easier to understand. To me, it has a much higher degree of WUD (thanks, friedo, for that term!) :-)

Re: replacing continuesly occuring pattern
by friedo (Prior) on Apr 16, 2005 at 09:34 UTC
    Something like this ought to do the trick:

    $xml =~ s/<(.*?)name>/<$1names>/g;

    [id://CountZero] is right, I misread the original question. I thought the OP simply wanted to replace the "name" tags with "names." Oops.

      Are you sure that works? When I tried it I get as a result: <names>abc</names><names>def</names> which is not what was requested.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law