agynr has asked for the wisdom of the Perl Monks concerning the following question:

Hello Everyone, I am
$pattern='(\n\s*[0-9]{1,3}\s*\n\s*<page>\s*\n)'; $target_data =~ s/$pa +ttern/" " x length($&)/gie;
What it is doing finding the required pattern and converting it with spaces. In addition to that what I want is if the found pattern contain any blank lines that should be converted to space equal to the no. of lines found. How it would be possible to do this.

Replies are listed 'Best First'.
Re: Replacing the pattern
by davido (Cardinal) on Jan 20, 2005 at 09:42 UTC

    First, change your substitution regexp to: s/$pattern/' ' x length( $1 )/gie; ...note the use of $1 instead of $&. It is advisable to always avoid $& if possible for performance reasons.

    Now to answer your question: Use a second regexp to keep it simple, and it should look like this:

    $target_data =~ s/(\n{2,})/'\n' . ( ' ' x length( $1 ) ) /ge;

    That converts any blank lines to a line with spaces in it equalling the number of blank lines found in a row. If you don't want the blank spaces to occupy a new line, remove the '\n' part from the right side of the substitution regexp.


    Dave

      This is not what I want. I want is to change the lines if any in that pattern found, not in the whole data. The sample data is
      </TABLE> 2 <PAGE> abcdefghijklmnopq
      I hope that u understand my problem.

        davido++ has given you a solution which replaces every newline with a space. Isn't that exactly what you're asking for?

        If not, you'll need to show several examples - enough that we can see what you want. Make sure the spaces are countable.

        After Compline,
        Zaxo

Re: Replacing the pattern
by Roy Johnson (Monsignor) on Jan 20, 2005 at 16:33 UTC
    Does this do what you want? I have used ! instead of space to make it easier to count.

    Update: Finally grasped the feature that the OP wanted and wasn't getting: In addition to substituting spaces, maintain the same number of lines. Like this:

    $_ =<<'EOS'; </TABLE> 2 <PAGE> RISK/RETURN SUMMARY AND FUND EXPENSES PRIME MONEY MARKET FUND EOS my $pat = qr/(\n\s*[0-9]{1,3}\s*\n\s*<page>\s*\n\s*)(.*\n)/i; s{$pat}{ my ($m1, $m2) = ($1, $2); my $num_lines = $m1 =~ tr/\n//; # Just counting lines $m1 =~ tr//!/c; # Replacing all chars with ! (for visibility; use sp +ace instead in final code) $m1 . $m2 . "\n" x $num_lines; }gie or warn "No match!\n"; print;
    Output is:
    </TABLE>!!!!!!!!!!!!RISK/RETURN SUMMARY AND FUND EXPENSES PRIME MONEY MARKET FUND

    Caution: Contents may have been coded under pressure.
Re: Replacing the pattern
by sasikumar (Monk) on Jan 20, 2005 at 12:38 UTC
    Hi

    Is this what you require

    my $pattern='(</TABLE>\n\s*[0-9]{1,3}\s*\n\s*<page>\s*)(.*)(\n*)'; $target_data =~ s/$pattern/$2." " x length($3)/gie;
    Thanks
    SasiKumar
Re: Replacing the pattern
by holli (Abbot) on Jan 20, 2005 at 09:43 UTC
    can you please provide some sample data?

    holli, regexed monk
      The sample data is
      </TABLE> 2 <PAGE> abcdefghijklmnopq
      I hope that u understand my problem. Please help me to solve the problem.
        And this should become... ?
The same pattern searching
by agynr (Acolyte) on Jan 21, 2005 at 06:14 UTC
    Hello Everyone, I am having a problem which is pinching me again and again. I have tried each and every method that I can think for but of no vain. I hope that u can help me out. The following is just an abstract of the document where the particular pattern can be found many a times. And also there is not any specific location where that <page> or its no. (here 2) is found. I mean to say that there can be any spaces or lines before ,in or after the pattern found. My Data is suppose this
    </TABLE> 2 <PAGE> RISK/RETURN SUMMARY AND FUND EXPENSES PRIME MONEY MARKET FUND
    It should be converted to
    </TABLE> RISK/RETURN SUMMARY AND FUND EXPENSES PRIME MONEY MARKET FUND
    Hope that it provides u a better look.

    Edit by castaway - reparented under original question

      Hi,

      Is this solving your problem. It solves for the sample data you have posted

      use strict; my $target_data="</TABLE>\n\n2\n<PAGE>\n\nRISK/RETURN SUMMARY AND FUND + EXPENSES\n\n\nPRIME MONEY MARKET FUND"; my $pattern='(\n\s*[0-9]{1,3}\s*\n\s*<page>\s*\n)(.*)(\n*)'; $target_data =~ s/$pattern/" " x length($1).$2."\n" x length($3)."\n" +x length($1)/gie; print $target_data; print "\n\n";


      The output is

      </TABLE> RISK/RETURN SUMMARY AND FUND EXPENSES PRIME MONEY MARKET FUND
      If the "\n" looks too many then you can change my regx concatenation of the regx to minimize it
      "\n" x length($3)."\n" x length($1)/gie;
      Thanks
      SasiKumar
        Thanks Sasi but the above code doesn't solve my problem. As it is not the general problem for the all patterns found but it is just for the above example. As I had told earlier that the location of the pattern or the no. of lines in the pattern found is not fixed and there is no surity that the page no found could be only that way. The pattern found could be
        abc The pattern follows <page> 24 def
        Like for the above case the code will not work. I had tried this code before also. I had made the pattern for the above cases as
        $pattern='(\n\s*[0-9]{1,3}\s*\n\s*<page>\s*\n)'; if ($target_data =~ /$pattern/gi) { $target_data =~ s/$pattern/" " x length($1)." " x length($2)." " x len +gth($3)." " x length($4)/gie; } $pattern='(\n\s*-[0-9]{1,3}-\s*\n\s*<page>\s*\n)'; if ($target_data =~ /$pattern/gi) { $target_data =~ s/$pattern/" " x length($1)/gie; } $pattern='(\n\s*[A-Za-z]-[0-9]{1,3}\s*\n\s*<page>\s*\n)'; if ($target_data =~ /$pattern/gi) { $target_data =~ s/$pattern/" " x length($1)/gie; } $pattern='(\n\s*page\s*[0-9]{1,3}\s*of\s*[0-9]{1,3}\s*\n)'; if ($target_data =~ /$pattern/gi) { $target_data =~ s/$pattern/" " x length($1)/gie; }
        But the above patterns are just lacking in the counting the no. of lines if found in the data, that should be converted into spaces. I hope that now u have a clear picture of the problem.