in reply to Re^4: Delete till end of line to another string
in thread Delete till end of line to another string

First, may I suggest to annotate your code with comments, and to streamline it a bit, because I found it quite incomprehesible. For instance, if I look at

s/[\\|<]$//g; s/^[\\|<]//g; s/<//g;
You first delete all \, | and < from the end and the beginning of $_, and then you erase all < from the whole $_. If you erase all < from $_ anyway, you don't need to do this too for the special cases 'begin' and 'end'. Also, if you want to erase completely certain characters from a string, tr is better readible, so your code would become
s/[\\|]$//g; s/^[\\|]//g; tr/</d;
Also I find your question self-contradictory. You ask
But for {LEAD} it doesnot creates the proper tag. ... For {LEAD} it creates the tag

So first you say it does not create the tag, then you say it does create the tag. The problem rather seems to me that it does not create an opening tag for HEADLINE only, or is there still something not correct with LEAD?

You should put into your code various print statements, so that you see what is going on, in particular after your program has encountered HEADLINE for the first time. I would do the same if I had to debug such a program...
-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re^6: Delete till end of line to another string
by Anonymous Monk on Aug 19, 2009 at 05:24 UTC
    Hi, I am not able to debug the script. When I grep for
    <HEADLINE>
    tag in a file grep '<HEADLINE>' 9702050900.xml It prints out as below. <HEADLINE>EVITE LA ENFERMEDAD PERIODONTAL PRA CTIQUE LA HIGIENE DENTAL PARA CUIDAR SUS ENC�AS</HEADLINE> But when I vi the file to view, it displays
    <COLUMN>Nuestro hijos.</COLUMN> + AS</HEADLINE> <LEAD>C......</LEAD>
    The problem is very strange. Anyone please help me
      Have a look at the file with a Hex viewer (for instance on Unix or other reasonable operating systems, using
      od -cx FILENAME | less
      . Maybe you find some conspicuous control character, such as an isolated carriage return (0x0DH).

      -- 
      Ronald Fischer <ynnor@mm.st>