guyov1 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to write a script which searches a file that contains a lot of paragraphs. At first I need to check the paragraph's headline and if it matches I need to substitute a string in that specific paragraph only. I know how to use m//g for search and s/// for substituting but I don't know how to to a "double search": after finding my match - finding another match inside. I need to do that "double search" a lot of times, for matching more than just one paragraph. Ok, this is part of the text I'm looking at :
285 blabla_data[28] OUTPUT ( 286 REQUIRED ( 287 lead_up 0.193118 br fclk clkdom(3) 288 early_lea clkdom(3) 289 late_trail_dn -0.084738 br fclk clkdom(3) 290 late_tra 291 ) 292 cext %0.00151055 299 min_ceff_up %0.034 300 TEXT TO BE REMOVED 301 ) 302 blabla5 [145] OUTPUT ( 303 REQUIRED ( 304 early qclk clkdom(6) 305 early_l qclk clkdom(7) 306 late_trail_dn -0.125163 bf qclk clkdom(7) 307 late_t(7) 308 TEXT TO BE REMOVED 309 )
As you see there are a lot of paragraphs in it containing headlines called "blabla", I want to delete all the "TEXT TO BE REMOVED" except for the paragraphs called "blabla5" . I need these replacements to happen all along the file. Thanks a lot, Guy

Replies are listed 'Best First'.
Re: Substitute in a subparagraph
by ikegami (Patriarch) on Oct 19, 2008 at 07:50 UTC

    As soon as you get into nesting, you really need a parser. Fortunately, some properties of your data language allow us to construct one rather easily.

    use strict; use warnings; sub process_para { my ($para) = @_; if (my ($id) = $para =~ /^(\w+)/) { if ($id ne 'blabla5') { $para =~ s/^TEXT TO BE REMOVED\n//mg; } } print($para); } { my ($depth, $buf); while (<DATA>) { $buf .= $_; ++$depth if /\(\s*$/; --$depth if /^\s*\)\s*$/; #printf("[%d] %s", $depth, $_); if (!$depth) { process_para($buf); $buf = ''; } } die("Bad nesting\n") if $depth; } __DATA__

    Update: With the new data format, you'll need

    sub process_para { my ($para) = @_; if (my ($id) = $para =~ /^ [ ]* (\S+)/x) { if ($id !~ /^.*clk\[\d\]\z/) { $para =~ s/^ [ ]* QUALIFIED [ ]* \n//xmg; } } print($para); }
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Substitute in a subparagraph
by GrandFather (Saint) on Oct 19, 2008 at 06:26 UTC

    So what does your current code look like? You could show us a sample of your current code. Remember to use <DATA> to read the sample you show us from a __DATA__ section in your sample script. It may be that we can show you a whole lot of good stuff if only we had some code to comment on.


    Perl reduces RSI - it saves typing
      Exactly the way I posted... I cant post the whole file but it looks exactly the same. Repeats itself starting line 1.
      285 blabla_data[28] OUTPUT ( 286 REQUIRED ( 287 lead_up 0.193118 br fclk clkdom(3) 288 early_lea clkdom(3) 289 late_trail_dn -0.084738 br fclk clkdom(3) 290 late_tra 291 ) 292 cext %0.00151055 299 min_ceff_up %0.034 300 TEXT TO BE REMOVED 301 ) 302 blabla5 [145] OUTPUT ( 303 REQUIRED ( 304 early qclk clkdom(6) 305 early_l qclk clkdom(7) 306 late_trail_dn -0.125163 bf qclk clkdom(7) 307 late_t(7) 308 TEXT TO BE REMOVED 309 )

        I don't want to see more data. I want to see your code. Here's a start:

        use warnings; use strict; ... while (<DATA>) { ... } ... __DATA__ 285 blabla_data[28] OUTPUT ( 286 REQUIRED ( 287 lead_up 0.193118 br fclk clkdom(3) 288 early_lea clkdom(3) 289 late_trail_dn -0.084738 br fclk clkdom(3) 290 late_tra 291 ) 292 cext %0.00151055 299 min_ceff_up %0.034 300 TEXT TO BE REMOVED 301 ) 302 blabla5 [145] OUTPUT ( 303 REQUIRED ( 304 early qclk clkdom(6) 305 early_l qclk clkdom(7) 306 late_trail_dn -0.125163 bf qclk clkdom(7) 307 late_t(7) 308 TEXT TO BE REMOVED 309 )

        just fill in the dotted parts.


        Perl reduces RSI - it saves typing
Re: Substitute in a subparagraph
by blazar (Canon) on Oct 19, 2008 at 10:47 UTC
    I know how to use m//g for search and s/// for substituting but I don't know how to to a "double search": after finding my match - finding another match inside.

    I personally believe that -letting your actual problem aside for a moment- as soon as you know how to write a sub, you should have solved your problem: you should know that the /e modifier allows you to include code in the substitution part of an s///. So just capture the text you want to process, and process it in a separate sub. Of course, you can also inline the sub code in the substitution part itself. But then you will have to pay much care to the delimiters, or else the perl parser may easily get confused. For example, the whole program I suggested elsewhere may be cast into the form of a single substitution:

    s/ ^ \b ( [ \w \s \[ \] ]+ \s+ \( ) $ ( .*? ^\)$ ) / my ($head, $body)=($1, $2); $body =~ s|QUALIFIED|| unless $head ~~ m|^\w+?clk\[\d\]|; $head . $body /gemsx;
    --
    If you can't understand the incipit, then please check the IPB Campaign.