Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I need help to convert the source format file to target format file, content detailed below:

Source

<businesses> <entity name="Retail"> <description>Products and items available in retail</description +> <product min="0" max="3"> <description>Product information</description> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </product> </entity> <item name="Patrol"> <description>Title</description> <text maxlength="80" size="65" /> </item> <entity name="Sports"> <description>Sports Items</description> <product min="0" max="3"> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </product> </entity> </businesses>

Target

<businesses> <block name="Retail" min="0" max="3"> <description>Products and items available in retail</description +> <description>Product information</description> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </block> <item name="Patrol"> <description>Title</description> <text maxlength="80" size="65" /> </item> <block name="Sports" min="0" max="3"> <description>Sports Items</description> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </block> </businesses>

This is the code snippet attempting the conversion.

It converts the begining tags(e.g cited below) for the first instance (conversion must happen at every occurance)

<entity name="Retail"> <description>Products and items available in retail</description +> <product min="0" max="3">

To

<block name="Retail" min="0" max="3"> <description>Products and items available in retail</description +>

Also the corresponding end tags need to be replaces as below

</product> </entity>

To

</block>

Please Advice.

########### Process the Input string ############ my $replstr ="<block "; ## The $strFile holds the source string if($strFile =~ m/<product (.*)?>/) { $range_rep = $1; $strFile =~ m/<product [^>]+>/; $str2relpace = "$`$&"; # This is the string to be replaced if($str2relpace =~ m/<\/(.*)?></) { $matchStr = "$'"; }else { $matchStr =$str2relpace; } if($` =~ m/<description>.*<\/description>/) { $desc = $&; } $desc = "\n".$desc if($desc); $` =~ m/<item (.*)?>/; $elementName = $1; $eleVal=$1 if($elementName =~ m/\"(.*)\"/); # Constructing the replace string. $replstr .=$elementName." ".'location="'."$eleVal\" ". "$range_rep". +'>'; # This is the replacing string $strFile =~ s/\Q$matchStr\E/$replstr$desc/g; #$strFile =~ s/<\/product><\/entity>/<\/block>/ig; print "\n\n\n The output is ================= \n $strFile \n \n ==== +==============\n"; }

READMORE tags added by Arunbear

Replies are listed 'Best First'.
Re: String Manipulation Help Required
by rev_1318 (Chaplain) on Jun 29, 2005 at 07:25 UTC
    Don't try to write your own XML-parser/transformer. Use an existing one. Use XSLT, or, when you want to use Perl, use one of the many XML-parsing modules like XML::Parse or XML::Twig.

    Paul

      Seconded. Perl is my tool of choice, but for this problem XSLT is the right way.

      Update: Added stylesheet
      <?xml version='1.0' ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" versi +on="1.0"> <xsl:output method="xml" indent="yes" /> <xsl:preserve-space elements="*" /> <xsl:template match="/businesses"> <xsl:copy> <xsl:for-each select="./entity"> <block> <xsl:attribute name="name"> <xsl:value-of select="./@name" /> </xsl:attribute> <xsl:attribute name="min"> <xsl:value-of select="./child::product/@min" /> </xsl:attribute> <xsl:attribute name="max"> <xsl:value-of select="./child::product/@max" /> </xsl:attribute> <xsl:for-each select = ".//description"> <xsl:if test="name(./parent::*)!='item'"> <xsl:copy> <xsl:value-of select = "." /> </xsl:copy> </xsl:if> </xsl:for-each> <xsl:for-each select = ".//item"> <xsl:copy-of select="." /> </xsl:for-each> </block> </xsl:for-each> </xsl:copy> </xsl:template> </xsl:stylesheet>


      holli, /regexed monk/
Re: String Manipulation Help Required
by gopalr (Priest) on Jun 29, 2005 at 07:44 UTC

    TIMTOWDI

    $strFile=' <businesses> <entity name="Retail"> <description>Products and items available in retail</description +> <product min="0" max="3"> <description>Product information</description> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </product> </entity> <item name="Patrol"> <description>Title</description> <text maxlength="80" size="65" /> </item> <entity name="Sports"> <description>Sports Items</description> <product min="0" max="3"> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </product> </entity> </businesses> '; $strFile=~s#<entity ([^>]+)>\s*(.+?)<product (min="[^"]+" max="[^"]+") +>#<block $1 $3>\n$2#gs; $strFile=~s#</product>\s*</entity>#</block>#gs; print "\n$strFile";

    Thanks,
    Gopal.R

      Thanks Gopal.R

      Your solution is of great help. Thanks.

      The Regex failed when the "<entity></entity> is changed to "<item></item>". This was attempeted with the intention that the first level element have the same name.

      The Error can be over come, if "</item>" tag does not occur in the search string. Something like below

      $strFile=~s#<entity ([^>]+)>\s*(.+?)<i>negating regex here </i><produc +t (min="[^"]+" max="[^"]+")

      Please advice solution as I am a novice Regex Developer.

      Thanks in advance.

Re: String Manipulation Help Required
by gellyfish (Monsignor) on Jun 29, 2005 at 11:01 UTC

    Others have suggested using XSLT but I thought it might be useful to actually show you how - here is the stylesheet:

    <?xml version="1.0"?> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" /> <xsl:template match="/"> <businesses> <xsl:apply-templates /> </businesses> </xsl:template> <xsl:template match="entity"> <block> <xsl:attribute name="name"> <xsl:value-of select="@name" /> </xsl:attribute> <xsl:attribute name="min"> <xsl:value-of select="product/@min" /> </xsl:attribute> <xsl:attribute name="max"> <xsl:value-of select="product/@max" /> </xsl:attribute> <xsl:copy-of select="description" /> <xsl:copy-of select="product/*" /> </block> </xsl:template> <xsl:template match="item"> <xsl:copy-of select="." /> </xsl:template> </xsl:transform>
    And of course code in Perl, first using XML::XSLT:
    use XML::XSLT; + my $xslt = XML::XSLT->new ('tr.xslt', warnings => 1); + $xslt->transform ('source.xml'); print $xslt->toString; + $xslt->dispose();
    Secondly using XML::LibXSLT:
    use XML::LibXSLT; use XML::LibXML; + my $parser = XML::LibXML->new(); my $xslt = XML::LibXSLT->new(); + my $source = $parser->parse_file('source.xml'); my $style_doc = $parser->parse_file('tr.xslt'); + my $stylesheet = $xslt->parse_stylesheet($style_doc); + my $results = $stylesheet->transform($source); + print $stylesheet->output_string($results);

    /J\

Re: String Manipulation Help Required
by anonymized user 468275 (Curate) on Jun 29, 2005 at 07:19 UTC
    The first thing I see here is that when matching whitespace in regexps use '\s' instead of ' '.

    For example: \<product\s+ to match the start of the product tag.

    One world, one people

Re: String Manipulation Help Required
by prasadbabu (Prior) on Jun 29, 2005 at 07:22 UTC

    I have tried in another way using regex, but this can be done still more efficiently.

    $str = '<businesses> <entity name="Retail"> <description>Products and items available in retail</description +> <product min="0" max="3"> <description>Product information</description> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </product> </entity> <item name="Patrol"> <description>Title</description> <text maxlength="80" size="65" /> </item> <entity name="Sports"> <description>Sports Items</description> <product min="0" max="3"> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </product> </entity> </businesses>'; $str =~ s/(<entity)([^>]+)>((?:(?!(?:<product)).)*)(<product (min="[^" +]*" max="[^"]*")>)/<block$2 $5>$3$4/gsi; $str=~ s/<\/product>\s*<\/entity>/<\/block>/gsi; print "$str";

    This gives the following output.

    <businesses> <block name="Retail" min="0" max="3"> <description>Products and items available in retail</description +> <product min="0" max="3"> <description>Product information</description> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </block> <item name="Patrol"> <description>Title</description> <text maxlength="80" size="65" /> </item> <block name="Sports" min="0" max="3"> <description>Sports Items</description> <product min="0" max="3"> <item name="Title"> <description>Title</description> <text maxlength="80" size="65" /> </item> </block> </businesses>

    updated:

    Prasad