multi-line regexp

jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: multi-line regexp by prasadbabu (Prior) on Dec 21, 2005 at 11:07 UTC
If i understood your question correctly, this will work. '.' matches newline character when you use the 's' option modifier in your regex. Also take a look at perlre. `#! /usr/bin/perl $str = "aaaaa\nbbbbb\nccccc\naaaaa\nddddd\neeeee\n" ; @a = $str =~ /aaaaa(?:(?!aaaaa).)*/gs ; print "$a[0]\n$a[1]" ;` [download] updated: removed extra grouping. Thanks in advance Prasad	[reply] [d/l]
Re^2: multi-line regexp by jeanluca (Deacon) on Dec 21, 2005 at 11:34 UTC
yes, it all makes more sense now, but your regexp is complex... I would like to understand whats going on there with all the ?: Could you add some description of whats going on there ? Thanks a lot Luca	[reply]
Re^3: multi-line regexp by prasadbabu (Prior) on Dec 21, 2005 at 11:40 UTC
jeanluca '?:' is used in regex grouping to avoid storing the matched string in the system variables like $1, $2 etc. '?!' is nothing but negative lookahead condition. You take a look at the perlre. Thanks in advance Prasad	[reply]
Re^4: multi-line regexp by blazar (Canon) on Dec 22, 2005 at 09:38 UTC
Re^3: multi-line regexp by l3v3l (Monk) on Dec 21, 2005 at 17:42 UTC
maybe this helps: `perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/(aa +aaa(?:(?:(?!aaaaa).)))/s)->explain'` [download] The regular expression: (?s-imx:(aaaaa(?:(?:(?!aaaaa).)))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?s-imx: group, but do not capture (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- aaaaa 'aaaaa' ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- (?: group, but do not capture (0 or more times (matching the most amount possible)): ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- aaaaa 'aaaaa' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- . any character ---------------------------------------------------------------------- )* end of grouping ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l] [select]
Re^4: multi-line regexp by doctor_moron (Scribe) on Dec 23, 2005 at 11:23 UTC
Re^3: multi-line regexp by doctor_moron (Scribe) on Dec 23, 2005 at 11:06 UTC
I sat all day, tried to understand prasadbabu's code, then i asked for help to id-perl. Then someone named Jacinta Richardson told me about this, and she said : `/ aaaaa # Find me aaaaa (?: # Followed by, but do not capture (?: # Group but do not capture (?! # Something which is not aaaaa . #and any char including newline ) )* # As many as possible ) /gs # Repeat the match, dots can include newlines` [download] The first grouping is unnessary, but not a problem. Negative look-aheads ask the regular _expression to look at the nextvalue and only include it in the match if it does not match that part of the _expression. Thus the regular _expression finds: aaaaa\nbbbbb\nccccc\n in its first run, stopping at the "aaaaa\n" which matches the negative look-ahead and then in its second run finds: aaaaa\nddddd\neeeee\n That's what she said, and then i realize that Jacinta Richadson known as jarich here. Thanks for your time Jarich, and hope this help jeanluca too	[reply] [d/l]
Re^2: multi-line regexp by doctor_moron (Scribe) on Dec 21, 2005 at 17:38 UTC
hi your code's working here and its looks nice (at least for me), tobe honest i need more times to understand your code, i wish you can explain the process of your code (while i am reading my notes about pattern matching) Anyway i tried other ways, and so far i made a litle code like : `$str = "aa\nbb\ncc\naa\ndd\nee\n"; @a = ($str =~ /aa\n.\n./g); print "1 = $a[0]\n2 = $a[1]\n";` [download] And the other is : `$str = "aa\nbb\ncc\naa\ndd\nee\n"; @a = ($str =~ /(aa.).?(aa.)/gs); print "1 = $a[0]\n2 = $a[1]\n";` [download] But anyway, prasadbabu code's nicer, that's why i asked for the explanation, or do you see something bad in my code ? Update : pKai code simpler and easy to understand for me :) thanks, zak	[reply] [d/l] [select]
Re^3: multi-line regexp by pKai (Priest) on Dec 21, 2005 at 18:25 UTC
Well, this code of yours makes some very specific assumptions about the input string: `/aa\n.\n./g` assumes, that every aa-line is followed by exactly 2 other lines which have to be extracted in addition to the aa-lead. `/(aa.).?(aa.)/gs` extracts exactly 2 fields from the string beginning with aa. The regexes with look-ahead where proposed to cover a wider range of input strings.	[reply] [d/l] [select]
Re: multi-line regexp by Happy-the-monk (Canon) on Dec 21, 2005 at 11:13 UTC
Any suggestions ? See in `perldoc perlre` on line 30 what the `m`-switch does. Without the `s`-switch, the dot (`.`) actually doesn't match a newline. Cheers, Sören	[reply]
Re: multi-line regexp by blazar (Canon) on Dec 21, 2005 at 11:39 UTC
Your post is slightly confusing to me, and I suggest trying to be more accurate: e.g. `s/muli/multi/`, and use `<code>` (or `<c>`) tags! If I get it right, though, you're just confusing `/m` for `/s`, which is not uncommon after all. In doubt always check perldoc perlre!	[reply] [d/l] [select]
Re: multi-line regexp by pKai (Priest) on Dec 21, 2005 at 17:56 UTC
Some variation of the theme: `/(aaaaa.?)(?=aaaaa\|$)/sg` A little bit more straight forward, as in avoiding negative* look-ahead, which also confuses me on various occasions ;-)	[reply] [d/l]
Re: multi-line regexp by GrandFather (Saint) on Dec 21, 2005 at 19:50 UTC
This uses a minimum capture `.?` and positive look ahead `(?=...)` with a conditional match `aaaaa\|$` to do the job: `#! /usr/bin/perl use strict; use warnings; my $str = "aaaaa\nbbbbb\nccccc\naaaaa\nddddd\neeeee\n" ; my @a = $str =~ /(aaaaa.?)(?=aaaaa\|$)/gs ; print join "\n", @a;` [download] Prints: `aaaaa bbbbb ccccc aaaaa ddddd eeeee` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^2: multi-line regexp by doctor_moron (Scribe) on Dec 21, 2005 at 20:41 UTC
And i think in this case, there is no difference between : `my @a = $str =~ /(aaaaa.?)(?=aaaaa\|$)/gs ;` [download] and `my @a = $str =~ /aaaaa.?(?=aaaaa\|$)/gs ;` [download] right ?	[reply] [d/l] [select]
Re^3: multi-line regexp by GrandFather (Saint) on Dec 21, 2005 at 21:07 UTC
Interesting. Yes you are right and I've learned something. Thank you :) DWIM is Perl's answer to Gödel	[reply]
Re^4: multi-line regexp by Anonymous Monk on Dec 21, 2005 at 22:26 UTC