in reply to problem with removing something in XML file
Don't know why your regular expression is so complicated.
Assuming that all <REF > statements are always complete on a single line...
XML File Before
perl one-liner (on DOS)</S></TEXT><TEXT><S Entail="142" s_id="0"> Annan urges return to democracy in <REF C-ENTID="Nepal" EXT="Nepal" ID +="104" SТYPE="PROPNAME">Nepal</REF></S> <S Entail="138-139-142" s_id="1"> UN Secretary General Kofi Annan on Tuesday expressed deep concern over + events in <REF A-CLASS="No-Reference" A-REFTYPE="Entity" C-ENTID="Nepal" EXT="Ne +pal" ID="105" SТYPE="PROPNAME">Nepal</REF> and urged a return to democracy, after <REF C-ENTID="King Gyanendra Bir Bikram" COMMENT="Coref direction is f +orward" EXT="King Gyanendra Bir Bikram" ID="100" SТYPE="APNAME" +> King Gyanendra Bir Bikram</REF> dismissed <REF A-CLASS="Entity-Entity" A-DIR="Backward" A-RELTYPE="Ide +ntity" A-RESTYPE="Intra" A-TYPE="Referential" ANT-ID="105" ID="101"> the country</REF> 's coalition government and imposed an indefinite st +ate of emergency. </S><S Entail="138-139-143" s_id="2">
Result:perl -pibak -e "s/<\/?REF.*?>//ig" junk.txt
Sandy</S></TEXT><TEXT><S Entail="142" s_id="0"> Annan urges return to democracy in Nepal</S> <S Entail="138-139-142" s_id="1"> UN Secretary General Kofi Annan on Tuesday expressed deep concern over + events in Nepal and urged a return to democracy, after King Gyanendra Bir Bikram dismissed the country 's coalition government and imposed an indefinite state of + emergency. </S><S Entail="138-139-143" s_id="2">
UPDATE: Also assumes that there are no embedded ">" inside the REF tag
|
|---|