Yes my intention has never been to roll my own...and I am using Twig in parts of my code. The main reason I am trying to roll my own is due to all the xml differencing engines I have encountered share a same logic premise. And I have looked at a bunch of them from C to Java based. Unfortunately they all appear to have something in common that doesn't provide what I am truely looking for and that is not to cascade changes through sibling elements when an element is deleted. What I have found is that if I have multiple siblings in a element tree and you delete one element somewhere in the middle of the tree and add a new element to the sibling tree at the same time the differencing engine doesn't merely remove the deleted element and add the new element...it changes the element below the deleted element to reflect the deleted element as being changed and that then cascades down the sibling tree showing the newly added element as a change of the previously last element in the sibling tree. This is very difficult to deal with when trying to maintain representations of this data in a RDBMS. So instead of a simple delete record and add record you end up with multiple changes to existing records cascaded down the sibling tree...with never indicating that an element was deleted and an element was added. Somehow all the diffrencing engines appear to maintain sibling element order as a key aspect of watching for changes...thus my intention to try and roll my own.

The reason I am trying to understand the RegEx is to be able to detect tag patterns without having to know the contents of the tags...thus I don't want to write the matching pattern for ever possible tag...that is possible but I want it to function regardless of the tag name.

As far as the RegEx...what I am wanting is to have a RegEx pattern that matches ^<ANYTHING>$ only with no attributes, but when you try to RegEx against xml that may look like <ANYTHING port="7777"> or <ANYTHING>someValue</ANYTHING> or <ANYTHING></ANYTHING> matching a pattern like /^<(.*)>$/ doesn't just get the first example...it also grabs the second, third and fourth. The RegEx I am trying to understand is to only grab the first <ANYTHING>...and its become harder than I have imagined.

So I have tried variations such as:
if($line =~ /^\s*<(\w+)>[^.+]/) if($line =~ /^\s*<(\w+)>[^(\w*|\d*|<*)]/) if($line =~ /^\s*<(\w+)>([^\w*]|[^\d*]|[^<*])/)
Just not sure how to overcome with a RegEx pattern...more complex patterns are easier because you have more items to anchor against...but the simplest tag <ANYTHING> is my harder than I thought.

In reply to Re^2: RegEx Against Arbitrary XML Tags by onegative
in thread RegEx Against Arbitrary XML Tags by onegative

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.