in reply to Transforming strange format to XML

I don't recognize the format either, so I'll just provide a substitution to transform the cell tags in a line, according to your example. s,\[_CELL_\]\s*(.*?)\s*?(?=\[_CELL_\]|$),<CELL>$1</CELL>,g; This matches from an opening [_CELL_] up to the next opening [_CELL_] or the end of the line, and sticks in the opening and closing <CELL> tags, swallowing any leading and trailing whitespace.

This substitution may need to be adjusted based on the details of the actual format.

Replies are listed 'Best First'.
Re: Re: Transforming strange format to XML
by Tortue (Scribe) on Nov 18, 2001 at 00:04 UTC
    I obviously hadn't quite mastered combining positive lookahead and non-greediness (selflessness?) yet. Until now my solution was to use two statements:
    s{\[_(CELL|COLHEAD)_\] (.*)}{<$1>$2</$1>}g; s{\s*\[_(CELL|COLHEAD)_\]\s*}{</$1><$1>}g;
    Here it's more complicated because there's several of these tags. It's lamer and slower than yours, so I'll gladly change it, thanks!

    Pauses to think for a while... Hm, with that extra twist I just introduced (not fair, I know), maybe the two-step version isn't slower. Lookahead on something it doesn't know yet could get tricky, maybe.

        s,\[_(CELL|COLHEAD)_\]\s*(.*?)\s*?(?=\[_\1_\]|$),<$1>$2</$1>,g;
    But it seems to work fine, and I don't think I care about speed anyway.