in reply to Re: Cleaning Files
in thread Cleaning Files

I really must learn to clarify my posts. I'm not trying to parse the XML, just clean it before it is picked up on our FTP server by another department. I'm the middle man here - neither the generator of said XML nor the intended recipient. Thanks anyway for your suggestion.

UPDATE - Thanks theguvnor, you're right of course - I am obviously parsing here. I also agree that regex parsing is a complete no-no for any form of long-term XML parsing solution (I use XML::Parser very frequently actually). The program I whipped up was a quick hack of a "fix" program that would be used on XML that I could guarantee would not change format, hence regex parsing is not as scary (perhaps). The thought being that "proper" XML parsing with a reputable parser (i.e XML::Parser) and then re-writing the XML out was overkill. Then again, perhaps it was a mistake to even post my code (*grin*) (it does work afterall) as I should have known I'd be taken out back and beaten with a stick for even mentioning XML and regex in the same breath (*grin*).

Thanks again mate, I do appreciate your answers as it was obviously a dodgy post judging from the lack of overall response ;)

Replies are listed 'Best First'.
Re: Re: Re: Cleaning Files
by theguvnor (Chaplain) on Jan 24, 2002 at 22:15 UTC
    Ah, but in order to clean it, you must do some kind of parsing. (That's what the process of reading a text stream in order to perform operations that will output a transformed text stream is.) My point is that regular expressions alone are prone to failure.

    But if you're looking to do a RE only solution, and you're ONLY looking to remove EMPTY tags then something involving a match on >< with a look-behind for < and look-ahead for > might do...

    Good luck in any event!

    Update: I'm not claiming to have a dictionary definition of parsing, I was just trying to explain how I see the situation: reading the file, separating tag from text, and throwing away the empty tags... in short, parsing :)

    Update: no problem vek, I didn't think it was a dodgy post in the least, and I wasn't trying to beat you down at all - in fact it sounds like you have way more experience with XML parsing than I do! Code on brother....