nysus has asked for the wisdom of the Perl Monks concerning the following question:

Let's say I've got the following string in a file:
"skkjnfinduekasdhelkjadfoijejdklk"
Between each character in the string, there can be zero or more newlines. So the file could look something like this:
s
kkjn




find
ue
k


asd
helkjadfoijejdklk
My question is, is there a way to tell the RE engine to ignore newline characters so I don't have to write
/u\n*e\n*k\n*a\n*s\n*d\n*h\n*/
to match the string
"uekasdh"
?

While the above is simple enough to type for a short string, for longer ones I'd have to use function to generate the RE which seems a little too weird.

IMPORTANT NOTE: I would like to be able to replace a string like "ue\nk\n\n\nasd\nh" in the example file above with another string like "jjjjjjj" but I want to keep the other the newlines in the file in tact. In other words, I want to avoid stripping the newlines out of the file to do the search.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop";
$nysus = $PM . $MCF;
Click here if you love Perl Monks

  • Comment on Matching across newlines without stripping them out

Replies are listed 'Best First'.
Re: Matching across newlines without stripping them out
by BrowserUk (Patriarch) on Jun 15, 2003 at 19:16 UTC

    The simple answer to is there a way to tell the RE engine to ignore newline characters is No.

    However, I think that you are dismissing the idea of generating the regex too quickly. It is a quite legitimate thing to do. In this case I would use \s* instead of \n* if the is any chance of other whitespace that you want to ignore.

    #! perl -slw use strict; my $search_string = 'uekasdh'; # Split the search string into chars and intersperse # them with the "ignore whitespace" regex. my $re = join '\s*', split '', $search_string; my $data = do{ local $/; <DATA> }; # Slurp the data $data =~ s[$re][jjjjjjjj]; # Find and replace. print $data;

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Matching across newlines without stripping them out
by TomDLux (Vicar) on Jun 15, 2003 at 19:09 UTC

    That's the only way I can think of ... but you can make it a bit easier on yourself.

    my $goal = "uekasdh"; my $search = join "\\n*", split "", $goal; my $replace = "JJJJJ"; $line =~ s/$search/$replace/o;

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Matching across newlines without stripping them out
by ihb (Deacon) on Jun 15, 2003 at 23:44 UTC

    When I first read your question I misinterpreted it. When I then later was about to post my answer I realized that your boldened sentence should be interpreted as "don't fiddle with anything but the match", whereas I first interpreted it as "don't fiddle with the newlines, at all". So my solution keeps the newlines and scatters out your 'jjjjjjj' to the same positions as "uekasdh" were found.

    sub replace_keep_whitespace { my ($str, $word, $rep) = @_; $str =~ join('\\n*', split //, $word); my $match = \substr($str, $-[0], $+[0] - $-[0]); my $c = 0; $$match =~ s/[^\n]/substr($rep, $c++, 1)/eg; return $str; } my $str = <<'_DATA_'; s kkjn find ue k asd helkjadfoijejdklk _DATA_ my $word = 'uekasdh'; my $rep = 'j' x length $word; print replace_keep_whitespace($str, $word, $rep); __END__ s kkjn find jj j jjj jekjadfoijejdklk

    If the replacement word is longer than the matched word then it's simply ignored at the end. If it's shorter then, well, the remaining chars are removed (but the newlines are kept).

    I realize this wasn't what was asked for, but I find the solution pretty enough to post it anyway. ;-)

    ihb
      vroom: you're assuming /n between each char, when that was just an example.
      original poster: there is no way to do the search without stripping, i.e. no extended perl regex to ignore a certain char. and even with stripping it's not that simple to put the \n's back in the right place, but can be done. certainly not in a single regex. perhaps it might be achievable with the extended code regex, unfortunately i don't have the time right now to play with it to that depth. perhaps you were on the right track to start with...and modularize it enough so the code generates strings for you instead of hardcoding.
Re: Matching across newlines without stripping them out
by Anonymous Monk on Jun 17, 2003 at 06:04 UTC
    As mentioned, you cannot make a Perl regex ignore a newline in order to patch together a relevant match. What you can do is make it include everything, including newlines, which effectively ignores them (since they are no longer an obstacle to match what you want), and then strip out what you don't want. This can be done with the /s modifier on a regex applied to a scalar containing the file contents:

    ($raw) = $fileContents =~ m/(.*)/s; $raw =~ s/\n//g;

    This is similar in spirit to untainting data; moreover, this particular regex never has to backtrack.