Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Even after multiple attempts, I am at a total loss of how the "m" and "s" works for regex.
  • /m changes the meaning of ^ and $:
    • Without /m,
      • ^ matches only at the very beginning of the string. (This is the same as \A, except that \A is not affected by /m.)
      • $ matches at the very end of the string, but if the string ends with \n, it will match just before and just after this \n. (This is the same as \Z, except that \Z is not affected by /m.)
    • With /m,
      • ^ matches at the very beginning of the string, and just after any \n, except if the \n is the last character in the string. In other words, it matches at the beginning of each line within the string.
      • $ matches just before each \n, in other words before the end of every line within the string, and at the very end of the string.
  • /s changes the meaning of .:
    • Without /s, . matches anything except the newline, i.e. [^\n]. In other words, a regex of /.+/g is limited to matching one line within the string at a time.
    • With /s, . matches absolutely any character, including \n.

Note that /m and /s are completely independent of one another. Keep in mind that ^ and $ are zero-width matches - for example, this means that with $_ = "a\nb", a regex of /$/gm will match and leave the regex engine's position at before the \n*, and a following regex of /./gs would then match that \n. Here is some code to play around with (try changing the lists of $strings and $regexes). As you can see, /m really only becomes important if there are multiple \n's in the string. And of course there's the WebPerl Regex Tester that visualizes this as well (modern browser required).

use warnings; use strict; use open qw/:std :utf8/; use Term::ANSIColor qw/colored/; for my $str ( "a","a\n","a\nb","a\n\nb","a\nb\nc\n","a\nb\nc\nd") { for my $regex ( '/^/g','/^/gm','/$/g','/$/gm','/./g','/./gs' ) { my $o = join( '', map { sprintf "%2s", chr( $_<0x21 ? 0x2400+$_ : $_==0x7F ? 0x2421 : $_ ) } map ord, split //, $str )." "; my @matches; eval qq{ push \@matches, [[\@-],[\@+]] while \$str=~$regex ;1} or die $@; my ($matchcnt,%matches) = (1); for my $match (@matches) { my @pos = $match->[0][0]==$match->[1][0] ? ( $match->[0][0] * 2 ) : map { $_*2+1 } $match->[0][0]..$match->[1][0]-1; for my $p (@pos) { die "overlapping matches not supported" if exists $matches{$p}; $matches{$p} = $matchcnt; } } continue { $matchcnt++ } substr($o, $_, 1) = colored(['underline'], substr($o, $_, 1)) #"<u>".substr($o, $_, 1)."</u>" # alternative for HTML for sort { $b<=>$a } keys %matches; printf "%6s: %s\n", $regex, $o; } }

Output:

  /^/g:  a 
 /^/gm:  a 
  /$/g:  a 
 /$/gm:  a 
  /./g:  a 
 /./gs:  a 
  /^/g:  a ␊ 
 /^/gm:  a ␊ 
  /$/g:  a  
 /$/gm:  a  
  /./g:  a ␊ 
 /./gs:  a  
  /^/g:  a ␊ b 
 /^/gm:  a ␊ b 
  /$/g:  a ␊ b 
 /$/gm:  a ␊ b 
  /./g:  ab 
 /./gs:  a  b 
  /^/g:  a ␊ ␊ b 
 /^/gm:  a ␊  b 
  /$/g:  a ␊ ␊ b 
 /$/gm:  a  ␊ b 
  /./g:  a ␊ ␊ b 
 /./gs:  a   b 
  /^/g:  a ␊ b ␊ c ␊ 
 /^/gm:  a ␊ b ␊ c ␊ 
  /$/g:  a ␊ b ␊ c  
 /$/gm:  a ␊ b ␊ c  
  /./g:  abc ␊ 
 /./gs:  a  b  c  
  /^/g:  a ␊ b ␊ c ␊ d 
 /^/gm:  a ␊ b ␊ c ␊ d 
  /$/g:  a ␊ b ␊ c ␊ d 
 /$/gm:  a ␊ b ␊ c ␊ d 
  /./g:  abcd 
 /./gs:  a  b  c  d 

* Update: Note that Repeated Patterns Matching a Zero length Substring is relevant here (example).


In reply to Re: Applying regex to each line in a record. by haukex
in thread Applying regex to each line in a record. by pritesh_ugrankar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-25 14:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found