in reply to Re: nested tag matching
in thread nested tag matching

Hai Vautrin,

Thanks for ur kind reply.

Actually im working in an concern as text file to sgml/xml/html conversion and validation programmer(Trainee). what i asked as an example is not only for HTML but for all the mark up languages.

secondly, Im working with the plain text files with some short tags applied in them to convert them into some markup languages(XML or SGML) under some dtd specification.

input:

[[tx]][[nm]]Murugesan[[/nm]]is trainee perl programmer.[[/tx]] [[tx1]][[nm]]Murugesan[[/nm]]is trainee perl programmer.[[/tx1]]

output:

<customer><name type="rrd">Murugesan</name> is trainee perl programmer +</customer> <vendor><name type="integra">Murugesan</name> is trainee perl programm +er</vendor>

my next doubt is, while im converting above input text inside some short tag(nesting), what im doing now is im using the subroutines in regular expression to convert them into the output format as mentioned in my below code.

s/\[\[tx\]\](.*?)\[\[\/tx\]\]/'<customer>'.&txt($1).'<\/customer>'/egs +i; sub txt{ my $a=$_[0]; $a=~s/\[\[nm\]\](.*?)\[\[\/nm\]\]/<name type="rrd">$1<\/name>/gsi; return $a; }

In the above coding i have just done for single level nesting of name. some times more number of nestings are present. Is this method of using subroutines inside the regular expression which lead to nested subroutines is o.k or is there any other effecient method available.

Replies are listed 'Best First'.
Re: Re: Re: nested tag matching
by Vautrin (Hermit) on Feb 06, 2004 at 14:12 UTC

    First of all, unless you're using a very old version of perl, you can use custom quotes for your regular expressions, i.e.:

    would be much more clear written as:

    It's not really that much more readable because you have a lot of special charachters in your regular expression, but not having to escape out every / does make things clearer.

    You may want to take a lesson from HTML::TreeBuilder or XML::TreeBuilder and create a tree if you're doing a lot of work on your custom tags. Check out the code for XML::TreeBuilder. They basically created it as a subclass of HTML::TreeBuilder, overloaded some of the properties of the Elements, and had a complete system to process and handle XML.