murugu has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I have already asked same type of the doubt earlier. I have asked for converting the text splitted with tab space to elements.

Now im facing 2 problems. Even though i some what managed the first one, im unable to solve the second.

First one is:

Im having the input file separated by tabspaces

Input: <thead> abc bcd def abc bcd def abc bcd def </thead> OUTPUT: <thead> <row><col>abc</col><col>bcd</col><col>def</col></row> <row><col>abc</col><col>bcd</col><col>def</col></row> <row><col>abc</col><col>bcd</col><col>def</col></row> </thead>

I have done it by using subs_text(qr{(.+?)\n}m,"row") and then for each row again doing subs_text(qr{(.+?)\t}m,"col"). the problem im getting is first two columns are get converted but not the last column because of absence of tab at last. I have managed to convert last column with "col" by getting last child of the row which is text and then wrap it with "col" value. Am I doing right. I think this will be done more easier than what i have done. If so How?

My second question is:

If i already have some elements inside the same content given above. For example,

Input: <thead> abc <ex>bcd</ex> <lp>def</lp> abc <ex>bcd</ex> def abc bcd def </thead> OUTPUT: <thead> <row><col>abc</col><col><ex>bcd</ex></col><col><lp>def</lp></col></row +> <row><col>abc</col><col><ex>bcd</ex></col><col>def</col></row> <row><col>abc</col><col>bcd</col><col>def</col></row> </thead>

For the above code each "\n" is for the row separation but i m not able to convert the input into the required output. I dont know how to select the content upto tab space. I face problem only when a tab space is between two elements.

Please give me suggestions for the above questions.

Thanks in advance

--Murugesan.

Replies are listed 'Best First'.
Re: AGAIN XML::Twig tag conversion
by mirod (Canon) on May 14, 2004 at 08:34 UTC

    I think you are tackling this problem the wrong way. You are using an XML tools to work on non-XML data. XML::Twig can help there, but as much as it costs me to admit it, it can't perform miracles ;--(

    Where does the data come from? The proper time to add tags to the abc bcd def lines is _before_ you create the XML, or before these lines get into the XML. You don't have necessarily to create the whole row/col structure, but at least tag each cell. This will be trivial in a non-XML context, and this way you won't have to fight with the data.

    If this is really the content you have to work with, then just add a first pass, where you add tags to those columns, by just taking the content of your thead element, using regexps to split it, and just print to create the proper XML: you will have to deal with escaping markup characters but that will be simpler than what you are trying to do at the moment.

Re: AGAIN XML::Twig tag conversion
by chanio (Priest) on May 15, 2004 at 05:28 UTC
    Try something like this:

    perl -e'$tx="aaa \t bbb \t ccc";@let=split(/\t/,$tx);print "<row>";s/\ +s//g,print "<col>$_</col>" foreach (@let);print "</row>\n"'

    After getting each row. Split and join deal with things in the middle of lists, ok?

    .{\('v')/}
    _`(___)' __________________________