in reply to tagged text parser

periapt:

While it does appear to be plain XML, you'll want to be certain. What sorts of situations are going to be considered an error (e.g. tags out of order, mismatched tags, invalid characters, ...)? Are there any special cases you'll need to handle?

The devil is in the details, as they say. So if you keep to the same rules and conventions as XML, things will be pretty simple using the suggestions you've already received. However, if you have to do any special case handling, your life can quickly become difficult.

...roboticus

Replies are listed 'Best First'.
Re^2: tagged text parser
by periapt (Hermit) on Oct 07, 2009 at 16:51 UTC
    Wise words roboticus,

    For now I'm just working on proof of concept. Is this possible in a coherent, stable way. I expect that the finished product would hold to the rules and conventions of XML with regards to validity but not strictly since we've already determined that strict adherence is unworkable. (for example enforcement through schema).

    I'm hoping that by sticking to XML, I can reduce the amount of post processing (or edge cases) to a manageable level as I move farther along. However, I still have to flesh out the idea some more.

    Thanks

    PJ
    use strict; use warnings; use diagnostics;

      I want to underscore what roboticus said. If you use XML it can be straightforward and robust. If not, it will likely be, at least sometimes, hellish and difficult to explain to customers why it breaks randomly. Part of the point of XML is that if it's not valid, it's not XML. It should be considered unacceptable garbage.

      It is not difficult to write, read, parse, and validate against DTDs XML if you use something like XML::LibXML. If you don't intend to go that route, I would strongly recommend not using pretend XML. Use a different, real, format which is perhaps easier to sling like YAML, or JSON. So you know, I'm not trying to be critical. I'm trying to save you (and those who will inherit your code base) pain. Using a fake version of a real data format is like writing your own custom format from scratch except more confusing.