Re: delete lines till

Try:

s/[^\n]+\{TT\}\nD.+?(\{TT\}|$)/$1/gs
[download]

Explained:

s/ startes a replace, because we want to replace the {TT}/d - blocks by "nothing" (which means just deleting them)

^\n+ selectes all text (everything which is no newline char = \n) as you said "...delete all chars from the line which has text {TT}", remove this part if you want the chars before the {TT} to stay.

\{TT\}\nD matches your "if there is a char D below the {TT}" = A {TT} followed by a newline followed by a D

.+? matches everything from the newline following the D, but the ? says that we wan't to match as few as possible chars (missing it would match everything beginning from here)

(\{TT\}|$) finally selects the next {TT} and stops the match before at this point. The |$ means that we also accept a "End of string" as match. The last {TT}\nD - block wouldn't be deleted otherwise. You could also add ^\n+ after the ( to keep the whole line which holds the next {TT}

/$1 holds the replacement string for the earlier match, in this case the first ( ) block - which is our "beginning of next block" marker

/gs has two options: g for "replace all" and s for "make . also match \n" which is important for the .+? - block - it won't work over newlines otherwise.

This solution will do what you want as long as you could get the data into a variable. My personal choice would be this short way as long as the amount of data is below 25% of the memory your script may eat. Assuming you could give 1 GB of RAM to it would allow 250GB of data being processed, maybe more.

Comment on Re: delete lines till Download Code

Replies are listed 'Best First'.
Re^2: delete lines till by Anonymous Monk on Aug 25, 2009 at 10:46 UTC
#!/usr/bin/perl while(<DATA>){ if (/\{TT\}/ .. /^\{TAG\}/) { unless (/^\{(TT\|TAG)\}/) { $deletestrings = $_; #if($_ =~ m/^D$/){ print $_; open FILE, '>list.txt'; print FILE $_; close FILE; $_ = '' if index( $_, "$deletestrings" ) >= 0; } } } __DATA__ S 9912290449 00005941^B{TT} D {TAG} 9912290449 {PUBLICATION} THE OS {DATE} 000101 S 9912290450 00005941^B{TT} R {TAG} 9912290450 {DATE} 000101 {TDATE} Saturday, January 1, 2000 S 9912290451 00005941^B{TT} D {TAG} 9912290451 {DATE} 000101 {TDATE} Saturday, January 1, 2000 [download] Now the above writes the lines which is between {TT} and {TAG}. The output is `S 9912290449 00005941^B{TT} D S 9912290450 00005941^B{TT} R S 9912290451 00005941^B{TT} D` [download] How to write only two lines which has the character D to the file. In the file: only `S 9912290449 00005941^B{TT} D S 9912290451 00005941^B{TT} D` [download] should be writeen	[reply] [d/l] [select]
Re^3: delete lines till by Anonymous Monk on Aug 26, 2009 at 03:59 UTC
Actually the above statement prints all the lines `S 9912290449 00005941^B{TT} D S 9912290450 00005941^B{TT} R S 9912290451 00005941^B{TT} D` [download] But I want to print only the statemtents which has 'D' and its above line. Yhe output should be something like this `S 9912290449 00005941^B{TT} D S 9912290451 00005941^B{TT} D` [download]	[reply] [d/l] [select]