in reply to Escaping XML Reserved characters

The following appears that it might do what you want. Note though that it will *not* work if you have cdata and elements mixed together.

#!/usr/bin/perl
use strict;
use HTML::Entities qw(encode_entities);
$| = 1;

my $str = "<Root>
  <Node1>Data with < in it</Node1>
  <Node2>Data with > in it</Node2>
  <Node3>
      <SubNode1>'one'</SubNode1>
      <SubNode2><\"two\"></SubNode2>
  </Node3>
</Root>";

Match($str);

sub Match
{
    my $str = shift;
    my $c = shift || 0;
    while ($str =~ m!<(_a-zA-Z0-9+)>(.+?)</\1>!sg)
    {
	my $tag = $1;
	my $tmp = $2;
	printf("%s<$tag>\n", "\t" x $c);
	# If there are subelements, recurse, otherwise,
	# just encode and print the data.
	if ($tmp =~ m!<(_a-zA-Z0-9+)>(.+?)</\1>!)
	{
	    Match($tmp, $c + 1);
	}
	else
	{
	    $tmp = encode_entities($tmp);
	    printf("%s$tmp\n", "\t" x ($c + 1));
	}
	printf("%s</$tag>\n", "\t" x $c);
    }
}

Outputs:

<Root>
        <Node1>
                Data with &lt; in it
        </Node1>
        <Node2>
                Data with &gt; in it
        </Node2>
        <Node3>
                <SubNode1>
                        'one'
                </SubNode1>
                <SubNode2>
                        &lt;&quot;two&quot;&gt;
                </SubNode2>
        </Node3>
</Root>

(Edited for formatting)

Replies are listed 'Best First'.
Re: Re: Escaping XML Reserved characters
by Bunsoy (Beadle) on Sep 15, 2003 at 07:57 UTC