Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: multi line matching problem

by SquireJames (Monk)
on Dec 16, 2003 at 02:18 UTC ( #314950=note: print w/replies, xml ) Need Help??

in reply to multi line matching problem

$data =~ y/\n/ /; # This will remove all newlines and replace them wit +h spaces
Better still, if you want to replace all the greater than one space(s) in your regular expression and take care of any newlines that you have at the time, you can do so with this RegEx:
$data =~ s/\s+|\n/ /g;
The main key here is that \s* has been replaced with \s+, so it's no longer greedy and will replace only multiple space characters.

The difference between s and m is that s is used to substitue an expression, whilst m is used to test for a pattern match. M is not really used too much as it is implied when you place a regular expression between two slashes (ie. $data =~ /\n/g is the same as $data=~ m/\n/g). The y modifier is a simple character replacement (transliteration).

Update: With much thanks to Enlil for the explaination, the whole regex could actually be written without the \n (i.e. $data =~ s/\s+/ /g).

Replies are listed 'Best First'.
Re: Re: multi line matching problem
by welchavw (Pilgrim) on Dec 16, 2003 at 02:50 UTC

    Part of your explanation is flatly erroneous. That \s+ is still greedy. The \n is never matched in your s/\s+|\n/ regex.

      For the record, this is the testing that I did, which works fine by me.
      $data = "15 65\n35 6\n445 34,546 59034584\n54 3,450 805;5409 + 8534\n\nStuff..."; print ($data); print ("\nChainging now\n\n"); $data =~ s/\s+|\n/ /g; print ($data);
      Sorry if I misunderstood the question, and I'll take the hit on the greedy statement, perhaps I should have said greedy only for space characters, which is what is wanted....

        Greediness is often misunderstood, but actually that is beside the point considering the regex supplied. After all, what are you saying when you say s/\s+|\n/ /g? Consider...

        $data = "\n\n\n"; $data =~ s/\s+/foobar/g; print "data=>$data\n";

        This outputs "foobar", because \n is whitespace covered by \s.

        Both * and + are greedy operators when not flipped to non-greedy by the application of ?. True, the implications of greediness are very important when greedy operators are combined with alternation (as above), but those implications do not come into play given the example. In the given regex, if you used ? to change your greedy + operator to non-greedy, then the RHS of the alternation operation would be applied. Regardless, the effect would be the same (' ' would be substituted for \n versus for \s, but you'd see the same result). Update: Let me strike that last bit. Got effect of ? on alternation messed up. Rest of explanation is fine (I think).

        Forgive me if I am missing something (I have the feeling I must be).

        On another note (to the original author of the SOPW node), please note that you don't need multi-line matching here. The m and s regex modifiers only have to do with "^", "$", and ".", not with \s.

Re: Re: multi line matching problem
by jcpunk (Friar) on Dec 16, 2003 at 02:47 UTC
    Thanks for the tip on implied m/// I will keep looking around at these things

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://314950]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2023-06-08 14:30 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (32 votes). Check out past polls.