in reply to delete lines till

Try:
s/[^\n]+\{TT\}\nD.+?(\{TT\}|$)/$1/gs
Explained:
  • s/ startes a replace, because we want to replace the {TT}/d - blocks by "nothing" (which means just deleting them)
  • ^\n+ selectes all text (everything which is no newline char = \n) as you said "...delete all chars from the line which has text {TT}", remove this part if you want the chars before the {TT} to stay.
  • \{TT\}\nD matches your "if there is a char D below the {TT}" = A {TT} followed by a newline followed by a D
  • .+? matches everything from the newline following the D, but the ? says that we wan't to match as few as possible chars (missing it would match everything beginning from here)
  • (\{TT\}|$) finally selects the next {TT} and stops the match before at this point. The |$ means that we also accept a "End of string" as match. The last {TT}\nD - block wouldn't be deleted otherwise. You could also add ^\n+ after the ( to keep the whole line which holds the next {TT}
  • /$1 holds the replacement string for the earlier match, in this case the first ( ) block - which is our "beginning of next block" marker
  • /gs has two options: g for "replace all" and s for "make . also match \n" which is important for the .+? - block - it won't work over newlines otherwise.
  • This solution will do what you want as long as you could get the data into a variable. My personal choice would be this short way as long as the amount of data is below 25% of the memory your script may eat. Assuming you could give 1 GB of RAM to it would allow 250GB of data being processed, maybe more.

    Replies are listed 'Best First'.
    Re^2: delete lines till
    by Anonymous Monk on Aug 25, 2009 at 10:46 UTC
      #!/usr/bin/perl while(<DATA>){ if (/\{TT\}/ .. /^\{TAG\}/) { unless (/^\{(TT|TAG)\}/) { $deletestrings = $_; #if($_ =~ m/^D$/){ print $_; open FILE, '>list.txt'; print FILE $_; close FILE; $_ = '' if index( $_, "$deletestrings" ) >= 0; } } } __DATA__ S 9912290449 00005941^B{TT} D {TAG} 9912290449 {PUBLICATION} THE OS {DATE} 000101 S 9912290450 00005941^B{TT} R {TAG} 9912290450 {DATE} 000101 {TDATE} Saturday, January 1, 2000 S 9912290451 00005941^B{TT} D {TAG} 9912290451 {DATE} 000101 {TDATE} Saturday, January 1, 2000
      Now the above writes the lines which is between {TT} and {TAG}. The output is
      S 9912290449 00005941^B{TT} D S 9912290450 00005941^B{TT} R S 9912290451 00005941^B{TT} D
      How to write only two lines which has the character D to the file. In the file: only
      S 9912290449 00005941^B{TT} D S 9912290451 00005941^B{TT} D
      should be writeen
        Actually the above statement prints all the lines
        S 9912290449 00005941^B{TT} D S 9912290450 00005941^B{TT} R S 9912290451 00005941^B{TT} D
        But I want to print only the statemtents which has 'D' and its above line. Yhe output should be something like this
        S 9912290449 00005941^B{TT} D S 9912290451 00005941^B{TT} D