I am not disagreeing with your recommendation to use moderation. That is an excellent idea. I am just advocating _extreme_caution ;--)

Especially when dealing with XML, which is a deceiptively simple format.

You can certainly use regexps to write a throw-away hack, which is going to be used only once, on very well known XML data, ideally generated by code you have also written yourself. That's about it! And it doesn't happen that often.

Using regexps on any thing else means that sooner or later you will come accross something that's completely legal XML, but that completely breaks your code. And believe me, if it is legal XML (and most likely even if it is not) it is bound to pop up in your data. You can hava a look at On XML Parsing for just a quick list of what can go wrong.

A last word: if you are dealing with something that is nearly (...) XML, do yourself a favor: use 2 steps: First get from the nearly-thingie to the real stuff, and then use an XML module. It would be even better if you could refuse the data alltogether because it is not valid!


In reply to Re: (tye)Re: Picking the best way.... by mirod
in thread Picking the best way.... by tinman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.