Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

As a follow up to a previous question - I am (now) writing a perl script to do a search and replace (the one liners became too cumbersom) and the issue that is killing me is where the string I am searching for (in order to replace it) the string is split over two lines ... bit like this</P.

hello there this is a bad <?string?> that we need to take away. Here is another bad <?st ring?> that needs to go too.

I need to replace all the <?string?> with <xxx> (let's say)

I have this simple little script, but of course it does not span the line like I want it to

#!/usr/bin/perl use strict; print "Info: ARGV is: @ARGV\n"; while (<STDIN>) { chomp; if (-e $_) { # a regular file (might be suited to your needs) # do something with $_ as if it were shifted from @ARGV print "handling file: $_\n"; open IN, "<$_" or die "Can't open $_: $!\n"; open OUT, ">outfile" or die "Can't open outfile: $!\n"; while (<IN>) { s/<?string?>/<xxx>/g; print OUT; } close IN;close OUT; } else { warn "no such file: $_ \n"; } }

(yes, I know, I need to escape the special chars - I left them out for readability for now)

Help !

Replies are listed 'Best First'.
Re: Search and replace over line break
by almut (Canon) on Mar 12, 2010 at 16:48 UTC

    As long as you can guarantee that the substring to match will not extend over more than two lines, you could use a two-line sliding buffer that you apply the substitution to, but then split again before outputting (so you won't get everything twice):

    #!/usr/bin/perl use strict; use warnings; my $prev = ""; my $done = 0; while (<DATA>) { # sliding two-line buffer $_ = $prev.$_; # get rid of middle newline, so it doesn't interfere with match s/\n//; s/<\?string\?>/<xxx>/g; # split buf # first part will be output, second part kept # for joining with next input line ($_, $prev) = unpack "a".length($prev)."a*", $_; print "$_\n" if 2 .. $done; # corner case: have last line be printed as well if (eof and !$done++) { $_ = ""; redo } } __DATA__ hello there this is a bad <?string?> that we need to take away. Here is another bad <?st ring?> that needs to go too.

    Output:

    hello there this is a bad <xxx> that we need to take away. Here is another bad <xxx > that needs to go too.

    As you can see, line breaks may be occur anywhere in the middle of the substitution. If that's a problem, you'd have to modify the logic of splitting the sliding buffer (which can get tricky...).

    (There may be corner cases I've overlooked, but you get idea...)

Re: Search and replace over line break
by Svante (Sexton) on Mar 12, 2010 at 12:34 UTC

    If there is a newline in your search string, where do you want that newline to appear in your replacement string?

    By the way, seeing as you put <> around your strings: *stern look* you are not trying to parse XML or HTML with regular expressions, are you?

      I am trying to parse XML with regular expressions - not a good idea ? It's DITA actually - but that's XML also. Tell me if this is a bad idea - the issue I'm working on is fixing up some XML files that have a whole list of "bad" tags - mostly removing them, but sometime having to replace them with something else.

        XML is not a regular language (please note that "regular" has a specific meaning with regard to a language; look it up if you do not know what this means). Therefore, you cannot generally parse it with a regular expression.

        Take an XML library that fits your need. Here are some descriptions of such.