in reply to problem with removing something in XML file
And there is a funny thing about your sample xml data: the "T" in the "STYPE" attribute of the REF tag is actually a Cyrillic "T", not an ASCII "T". Is that why you're trying to get rid of the REF tags, because they all got corrupted somehow? (It must have been caused by someone trying to do stream-edits on the XML data...) You could just fix that:
As mentioned earlier, just removing the tags is pretty simple -- it can be a one-liner on the command line -- if the <REF...> thing is never split up by a line break, but even if it is, you can just run perl in "file-slurp" mode:perl -CS -pe 'tr{\x{422}}{T}' file.xml > fixed.file.xml
It seems like a pretty safe bet that REF tags will never contain a ">" as part of an attribute, so this approach should suffice.perl -0777 -pe 's{</?REF[^>]*>}{}g' file.xml > noref.file.xml
|
|---|