Substitute in a subparagraph

guyov1 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to write a script which searches a file that contains a lot of paragraphs. At first I need to check the paragraph's headline and if it matches I need to substitute a string in that specific paragraph only. I know how to use m//g for search and s/// for substituting but I don't know how to to a "double search": after finding my match - finding another match inside. I need to do that "double search" a lot of times, for matching more than just one paragraph. Ok, this is part of the text I'm looking at :

285 blabla_data[28] OUTPUT (
286 REQUIRED (
287 lead_up 0.193118 br fclk clkdom(3)
288 early_lea clkdom(3)
289 late_trail_dn -0.084738 br fclk clkdom(3)
290 late_tra
291 )
292 cext %0.00151055
299 min_ceff_up %0.034
300 TEXT TO BE REMOVED
301 )
302 blabla5 [145] OUTPUT (
303 REQUIRED (
304 early qclk clkdom(6)
305 early_l qclk clkdom(7)
306 late_trail_dn -0.125163 bf qclk clkdom(7)
307 late_t(7)
308 TEXT TO BE REMOVED
309 )
[download]

As you see there are a lot of paragraphs in it containing headlines called "blabla", I want to delete all the "TEXT TO BE REMOVED" except for the paragraphs called "blabla5" . I need these replacements to happen all along the file. Thanks a lot, Guy

Comment on Substitute in a subparagraph Download Code

Replies are listed 'Best First'.
Re: Substitute in a subparagraph by ikegami (Patriarch) on Oct 19, 2008 at 07:50 UTC
As soon as you get into nesting, you really need a parser. Fortunately, some properties of your data language allow us to construct one rather easily. `use strict; use warnings; sub process_para { my ($para) = @_; if (my ($id) = $para =~ /^(\w+)/) { if ($id ne 'blabla5') { $para =~ s/^TEXT TO BE REMOVED\n//mg; } } print($para); } { my ($depth, $buf); while (<DATA>) { $buf .= $_; ++$depth if /$\s$/; --$depth if /^\s$\s$/; #printf("[%d] %s", $depth, $_); if (!$depth) { process_para($buf); $buf = ''; } } die("Bad nesting\n") if $depth; } __DATA__` [download] Read more... Test Data (576 Bytes) Update: With the new data format, you'll need `sub process_para { my ($para) = @_; if (my ($id) = $para =~ /^ [ ] (\S+)/x) { if ($id !~ /^.clk\[\d\]\z/) { $para =~ s/^ [ ] QUALIFIED [ ]* \n//xmg; } } print($para); }` [download]	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Substitute in a subparagraph by GrandFather (Saint) on Oct 19, 2008 at 06:26 UTC
So what does your current code look like? You could show us a sample of your current code. Remember to use <DATA> to read the sample you show us from a __DATA__ section in your sample script. It may be that we can show you a whole lot of good stuff if only we had some code to comment on. Perl reduces RSI - it saves typing	[reply]
Re^2: Substitute in a subparagraph by guyov1 (Novice) on Oct 19, 2008 at 06:48 UTC
Exactly the way I posted... I cant post the whole file but it looks exactly the same. Repeats itself starting line 1. `285 blabla_data[28] OUTPUT ( 286 REQUIRED ( 287 lead_up 0.193118 br fclk clkdom(3) 288 early_lea clkdom(3) 289 late_trail_dn -0.084738 br fclk clkdom(3) 290 late_tra 291 ) 292 cext %0.00151055 299 min_ceff_up %0.034 300 TEXT TO BE REMOVED 301 ) 302 blabla5 [145] OUTPUT ( 303 REQUIRED ( 304 early qclk clkdom(6) 305 early_l qclk clkdom(7) 306 late_trail_dn -0.125163 bf qclk clkdom(7) 307 late_t(7) 308 TEXT TO BE REMOVED 309 )` [download]	[reply] [d/l]
Re^3: Substitute in a subparagraph by GrandFather (Saint) on Oct 19, 2008 at 07:13 UTC
I don't want to see more data. I want to see your code. Here's a start: `use warnings; use strict; ... while (<DATA>) { ... } ... __DATA__ 285 blabla_data[28] OUTPUT ( 286 REQUIRED ( 287 lead_up 0.193118 br fclk clkdom(3) 288 early_lea clkdom(3) 289 late_trail_dn -0.084738 br fclk clkdom(3) 290 late_tra 291 ) 292 cext %0.00151055 299 min_ceff_up %0.034 300 TEXT TO BE REMOVED 301 ) 302 blabla5 [145] OUTPUT ( 303 REQUIRED ( 304 early qclk clkdom(6) 305 early_l qclk clkdom(7) 306 late_trail_dn -0.125163 bf qclk clkdom(7) 307 late_t(7) 308 TEXT TO BE REMOVED 309 )` [download] just fill in the dotted parts. Perl reduces RSI - it saves typing	[reply] [d/l] [select]
Re^4: Substitute in a subparagraph by repellent (Priest) on Oct 19, 2008 at 07:28 UTC
Re^5: Substitute in a subparagraph by guyov1 (Novice) on Oct 19, 2008 at 07:53 UTC
Some notes below your chosen depth have not been shown here
Re^4: Substitute in a subparagraph by guyov1 (Novice) on Oct 19, 2008 at 07:21 UTC
Re: Substitute in a subparagraph by blazar (Canon) on Oct 19, 2008 at 10:47 UTC
I know how to use m//g for search and s/// for substituting but I don't know how to to a "double search": after finding my match - finding another match inside. I personally believe that -letting your actual problem aside for a moment- as soon as you know how to write a sub, you should have solved your problem: you should know that the `/e` modifier allows you to include code in the substitution part of an s///. So just capture the text you want to process, and process it in a separate sub. Of course, you can also inline the sub code in the substitution part itself. But then you will have to pay much care to the delimiters, or else the perl parser may easily get confused. For example, the whole program I suggested elsewhere may be cast into the form of a single substitution: `s/ ^ \b ( [ \w \s \[ \] ]+ \s+ $ ) $ ( .? ^$$ ) / my ($head, $body)=($1, $2); $body =~ s\|QUALIFIED\|\| unless $head ~~ m\|^\w+?clk\[\d\]\|; $head . $body /gemsx;` [download] `--` If you can't understand the incipit, then please check the IPB Campaign*.	[reply] [d/l] [select]