I really must learn to clarify my posts. I'm not trying to parse the XML, just clean it before it is picked up on our FTP server by another department. I'm the middle man here - neither the generator of said XML nor the intended recipient. Thanks anyway for your suggestion.

UPDATE - Thanks theguvnor, you're right of course - I am obviously parsing here. I also agree that regex parsing is a complete no-no for any form of long-term XML parsing solution (I use XML::Parser very frequently actually). The program I whipped up was a quick hack of a "fix" program that would be used on XML that I could guarantee would not change format, hence regex parsing is not as scary (perhaps). The thought being that "proper" XML parsing with a reputable parser (i.e XML::Parser) and then re-writing the XML out was overkill. Then again, perhaps it was a mistake to even post my code (*grin*) (it does work afterall) as I should have known I'd be taken out back and beaten with a stick for even mentioning XML and regex in the same breath (*grin*).

Thanks again mate, I do appreciate your answers as it was obviously a dodgy post judging from the lack of overall response ;)

In reply to Re: Re: Cleaning Files by vek
in thread Cleaning Files by vek

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.