delete lines till

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: delete lines till by ikegami (Patriarch) on Aug 25, 2009 at 06:04 UTC
Deal with records, not lines. `local $/ = ''; while (<>) { next if /\A.*\nD$/m; print; }` [download] More readable, more versatile: `local $/ = ''; while (<>) { chomp( my @rec = split /^/m ); my $hdr1 = shift(@rec); my $hdr2 = shift(@rec); my %rec = @rec; next if $hdr2 eq 'D'; print; }` [download]	[reply] [d/l] [select]
Re^2: delete lines till by Anonymous Monk on Aug 25, 2009 at 06:38 UTC
Please tell me what does the line mean? `/\A.*\nD$/m;`. How to store that records with 'D' to a variable?	[reply] [d/l]
Re^3: delete lines till by Anonymous Monk on Aug 25, 2009 at 10:54 UTC
Why don't you install YAPE::Regex::Explain? The regular expression: (?m-isx:\A.\nD$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?m-isx: group, but do not capture (with ^ and $ matching start and end of line) (case- sensitive) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \A the beginning of the string ---------------------------------------------------------------------- . any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \n '\n' (newline) ---------------------------------------------------------------------- D 'D' ---------------------------------------------------------------------- $ before an optional \n, and the end of a "line" ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l]
Re: delete lines till by Sewi (Friar) on Aug 25, 2009 at 07:32 UTC
Try: `s/[^\n]+\{TT\}\nD.+?(\{TT\}\|$)/$1/gs` [download] Explained: s/ startes a replace, because we want to replace the {TT}/d - blocks by "nothing" (which means just deleting them) ^\n+ selectes all text (everything which is no newline char = \n) as you said "...delete all chars from the line which has text {TT}", remove this part if you want the chars before the {TT} to stay. \{TT\}\nD matches your "if there is a char D below the {TT}" = A {TT} followed by a newline followed by a D .+? matches everything from the newline following the D, but the ? says that we wan't to match as few as possible chars (missing it would match everything beginning from here) (\{TT\}\|$) finally selects the next {TT} and stops the match before at this point. The \|$ means that we also accept a "End of string" as match. The last {TT}\nD - block wouldn't be deleted otherwise. You could also add ^\n+ after the ( to keep the whole line which holds the next {TT} /$1 holds the replacement string for the earlier match, in this case the first ( ) block - which is our "beginning of next block" marker /gs has two options: g for "replace all" and s for "make . also match \n" which is important for the .+? - block - it won't work over newlines otherwise. This solution will do what you want as long as you could get the data into a variable. My personal choice would be this short way as long as the amount of data is below 25% of the memory your script may eat. Assuming you could give 1 GB of RAM to it would allow 250GB of data being processed, maybe more.	[reply] [d/l]
Re^2: delete lines till by Anonymous Monk on Aug 25, 2009 at 10:46 UTC
#!/usr/bin/perl while(<DATA>){ if (/\{TT\}/ .. /^\{TAG\}/) { unless (/^\{(TT\|TAG)\}/) { $deletestrings = $_; #if($_ =~ m/^D$/){ print $_; open FILE, '>list.txt'; print FILE $_; close FILE; $_ = '' if index( $_, "$deletestrings" ) >= 0; } } } __DATA__ S 9912290449 00005941^B{TT} D {TAG} 9912290449 {PUBLICATION} THE OS {DATE} 000101 S 9912290450 00005941^B{TT} R {TAG} 9912290450 {DATE} 000101 {TDATE} Saturday, January 1, 2000 S 9912290451 00005941^B{TT} D {TAG} 9912290451 {DATE} 000101 {TDATE} Saturday, January 1, 2000 [download] Now the above writes the lines which is between {TT} and {TAG}. The output is `S 9912290449 00005941^B{TT} D S 9912290450 00005941^B{TT} R S 9912290451 00005941^B{TT} D` [download] How to write only two lines which has the character D to the file. In the file: only `S 9912290449 00005941^B{TT} D S 9912290451 00005941^B{TT} D` [download] should be writeen	[reply] [d/l] [select]
Re^3: delete lines till by Anonymous Monk on Aug 26, 2009 at 03:59 UTC
Actually the above statement prints all the lines `S 9912290449 00005941^B{TT} D S 9912290450 00005941^B{TT} R S 9912290451 00005941^B{TT} D` [download] But I want to print only the statemtents which has 'D' and its above line. Yhe output should be something like this `S 9912290449 00005941^B{TT} D S 9912290451 00005941^B{TT} D` [download]	[reply] [d/l] [select]
Re: delete lines till by Anonymous Monk on Aug 25, 2009 at 06:02 UTC
What is the name of this format?	[reply]
Re: delete lines till by bichonfrise74 (Vicar) on Aug 25, 2009 at 19:13 UTC
Try this. Although I'm not sure how this relates to your original question. `#!/usr/bin/perl use strict; my @records = <DATA>; for (0 .. $#records) { print $records[$_] . $records[$_ + 1] if ( $records[$_] =~ /\{TT\}/ ); } __DATA__ S 9912290449 00005941^B{TT} D {TAG} 9912290449 {PUBLICATION} THE OS {DATE} 000101 S 9912290450 00005941^B{TT} R {TAG} 9912290450 {DATE} 000101 {TDATE} Saturday, January 1, 2000 S 9912290451 00005941^B{TT} D {TAG} 9912290451 {DATE} 000101 {TDATE} Saturday, January 1, 2000` [download]	[reply] [d/l]
Re^2: delete lines till by Anonymous Monk on Aug 26, 2009 at 11:55 UTC
`S 9912290449 00005941^B{TT} D S 9912290450 00005941^B{TT} R S 9912290451 00005941^B{TT} D <code/> But I want to print only the statemtents which has 'D' and its above l +ine. Yhe output should be something like this <code> S 9912290449 00005941^B{TT} D S 9912290451 00005941^B{TT} D` [download]	[reply] [d/l]