in reply to XML::Twig approach/architecture/design question

As a general question, why? You are going out of your way to be able to write @foo when you could write something like <cross_ref name="foo"/> instead. My only complaint is that it is a bit painfully verbose, but that is standard for XML.

After all what you are doing is going out of your way to produce a hybrid XML-something else scheme. But XML seems to already be flexible enough to encompass what you want to do, so why limit yourself to custom tools that handle your almost XML when you can just use XML instead?

Furthermore if you use straight XML, then you gain more flexibility because you have not just introduced a fragile assumption. For instance what happens in your scheme if some piece of data contains, say, some example Perl code, or an email address? Will you grab the @ which appeared for a reason you don't expect and mangle it? That is the kind of common mistake that people made over and over again which is cited as one of the reasons for standardizing on XML...

Replies are listed 'Best First'.
Re: Re (tilly) 1: XML::Twig approach/architecture/design question
by mirod (Canon) on Nov 05, 2001 at 21:02 UTC

    XML::Twig provides methods to process this kind of scheme into properly structured XML only because I use it a lot for conversions. When I convert documents from HTML or FrameMaker's MIF into XML I use a 2 step process, first from the original format to some kind of XML (any XML that can easily be created from the document, usually XHTML or something that mirrors the original format), then from that XML to the XML I want using XML::Twig. In that case I often need to add structure to unstructured text.

    I also use those methods when I have an already created an XML document and then I realize that it misses additional mark-up, for example adding links from items that have a definition to the definitions.

    These are quite specific problems, but I agree that in general it is best to stick to purely XML schemes.

      I found that, thanks. It looks like it was designed for exactly what I needed, since it also recurses into subelements!

      Can it (your split member) be directed to filter this recursion, e.g.

      <P>it will find @this and <B>recurse to get @this</B> but should be told <literal>not to scan in @here</literal>.</P>
      But maybe I don't need to... there are other parser issues associated with things like <literal> and I'll bring it up on another thread.

      —John

Re: Re (tilly) 1: XML::Twig approach/architecture/design question
by John M. Dlugosz (Monsignor) on Nov 05, 2001 at 21:39 UTC
    Why? Because it's so difficult to type!!! One of the things I loath about doing docs in HTML is the constant need to surround words with <code> tags. It breaks the flow of typing, or becomes a chore to add afterwards.

    Why do we have "enhanced" meaning of <code> here in PM? By your argument we should all be writing &amp; everywhere instead of letting PM take care of that chore.

    As I said originally, what I really want is a markup system that's easy to type. However, as discussions went here, I'm indeed using a "hibrid XML-something else" scheme because the infrastructure is just fine in XML, and it's only the paragraph formatting that needs to be more keyboard-friendly. So, I convert these shortcuts to XML as part of the processing.

    re For instance what happens in your scheme if some piece of data contains, say, some example Perl code, or an email address? Will you grab the @ which appeared for a reason you don't expect and mangle it?

    An @ is only significant if it appears after a non-word character, so the RE should ignore foo@bar.com. If I was quoting some data in a listing, it would be like the code tags here on PM: everything in it is literal. If I had a common need to use @ in free text at the beginning of a word, I would not have chosen it for this task. If I did need to once, it would be no more difficult than typing a open-square-bracket here on PM. But I also plan on having <lit> tags that takes everything up to the close tag literally like <code> does here, but without changing the formatting.

    If all else fails and I do it by mistake, I'll see the warning "xxx is not resolved as a cross-reference" when I build the doc set. And it won't be "mangled" if one slips though, because the text is never altered. It would have an inappropreate link attached because it was a homonym, but it will still be read correctly.

    I agree that XML is flexible, standard, etc. and a hybrid is not good for defining a data persistance or interchange mechanism. But that's not what I'm doing: I'm designing an authoring mechanism.

    With that in mind, you can see why I laughed when I saw “You are going out of your way to be able to write @foo when you could write something like <cross_ref name="foo"/> instead.”

    —John

      I think there was a misunderstanding then.

      From what you said, it looked like you were taking documents that were sort of but not quite XML and converting them into HTML with some of your "not quite" conversions as part of that. Which would mean that you were using a hybrid scheme behind the scenes.

      But XML is horrible to type from scratch. For that I either would define a set of editor macros so that you can type it without doing most of the typing, or I would create a mechanism for producing it from some markup. But in either case I would be strongly inclined to have a true XML intermediate document before passing it to anything like a standard XML processing library or tool. To do otherwise loses most of the reasons for having XML in there at all, and seriously limits what kinds of markup rules you can have for the human.

        That's what I was thinking in general terms, earlier on: write tools that revolve around an XML representation, and as another step have a human-markup translator that only has to translate it to XML. Writing the XML by hand can be done now before that other part is ready.

        But, if it's only the "body" or paragraph narrative part that I'm worried about, a few simple things like the @ for code will get me most of the way there with very little work. Internally, I'm transforming @word into <xref> first, then passing it on for further processing. The step that actually figures out the cross references will see xref tags that were generated from @-notation and those that were typed out the long way with no distinction between how they were originally typed.

        I could do these kinds of things with one program, write out another file, and then feed that to another program. But it's easy to do that same pipeline one "twig" at a time under one program.

        But you do inspire me to add an attribute to the document that specifies whether it is strict XML or contains the hybrid extensions. That way the straight form can be saved with the same file extension, and if they don't process out unless enabled, will provide backward compatibility when I come up with new markup extensions. That is, documents written without X in mind will not have their meanings altered.

        —John