in reply to Re^3: Substitute in a subparagraph
in thread Substitute in a subparagraph

If your paragraphs go from "blabla" to "blabla", a trick would be to set your input record separator to return records that end with "blabla", instead of the usual newline "\n". Perhaps something like:
use warnings; use strict; ... local $/ = "blabla"; while (<DATA>) { s/TEXT TO BE REMOVED// unless m/^5/; print; ... }

Replies are listed 'Best First'.
Re^5: Substitute in a subparagraph
by guyov1 (Novice) on Oct 19, 2008 at 07:53 UTC
    Ok, I think I need to be more specific. I attached the exact data right now. I need to match the paragraphs that their headlines contain (string)clk(digit in brackets) and leave the "QUALIFIED" there. In all other paragraphs I should delete it. So in that example I have 3 paragraphs. The first one matches what I want so I should leave the "QUALIFIED" but the third one does not meet my needs so I should delete it.
    qclk[6] INPUT ( ! "asdk fd sasd" VALID ( late_lead 3 ar qclk slope 20 late_lead 3 af qclk slope 04 early_dn 8 ar qclk slope 6 early_up 6 af qclk slope 6 ) cext %0.00394757 cmax %0.005504 QUALIFIED ) clkout_qclk_61[3] OUTPUT ( ) clkout_qclk_61[2] OUTPUT ( REQUIRED ( earlyp 0.5 br qclk clm(2) latel_up 5 bf qclk clk(2) ) REQUIRED ( early_lead_dn 0.004 bf qclk clkdom(2) late_trail_dn 0.005 br qclk clkdom(2) ) cext %0.0647336 max_ceff_up %0.187 QUALIFIED )
      Here's my go using a flag.
      #!/usr/bin/perl use warnings; use strict; my $del_qual; while (<DATA>){ if (/^\S/){ if (/^\w+clk\[\d+\]/){ $del_qual = 0; } else{ $del_qual = 1; } } next if $del_qual and /\s+QUALIFIED/; print qq{$_}; } __DATA__ qclk[6] INPUT ( ! "asdk fd sasd" VALID ( late_lead 3 ar qclk slope 20 late_lead 3 af qclk slope 04 early_dn 8 ar qclk slope 6 early_up 6 af qclk slope 6 ) cext %0.00394757 cmax %0.005504 QUALIFIED ) clkout_qclk_61[3] OUTPUT ( ) clkout_qclk_61[2] OUTPUT ( REQUIRED ( earlyp 0.5 br qclk clm(2) latel_up 5 bf qclk clk(2) ) REQUIRED ( early_lead_dn 0.004 bf qclk clkdom(2) late_trail_dn 0.005 br qclk clkdom(2) ) cext %0.0647336 max_ceff_up %0.187 QUALIFIED )
      qclk[6] INPUT ( ! "asdk fd sasd" VALID ( late_lead 3 ar qclk slope 20 late_lead 3 af qclk slope 04 early_dn 8 ar qclk slope 6 early_up 6 af qclk slope 6 ) cext %0.00394757 cmax %0.005504 QUALIFIED ) clkout_qclk_61[3] OUTPUT ( ) clkout_qclk_61[2] OUTPUT ( REQUIRED ( earlyp 0.5 br qclk clm(2) latel_up 5 bf qclk clk(2) ) REQUIRED ( early_lead_dn 0.004 bf qclk clkdom(2) late_trail_dn 0.005 br qclk clkdom(2) ) cext %0.0647336 max_ceff_up %0.187 )
      I've assumed that the start of each record is a non space character in the first column.

      I personally believe that since the text you're operating on seems highly structured, and more precisely a proper language, then a fully reliable solution would comprise to write a parser. Perhaps one exists already. However, if you're fairly sure that the your data is regular enough then you may want to slurp it all at once and process it with somewhat naive regexen. The following program follows such an approach does work as expected on your sample, but be warned that it may fail on the full data.

      --
      If you can't understand the incipit, then please check the IPB Campaign.

      What is the simulation language you are working with? If it is widely used and the sort of editing you want to do is common it may be that there is already public domain code available.


      Perl reduces RSI - it saves typing
      You may refine this further:
      use warnings; use strict; local $/ = "\n)\n"; while (<DATA>) { my $got_end = chomp; s/\s*QUALIFIED$// unless m/^.*?clk\[\d+\]/; print; print $/ if $got_end; }