Re: multi-line regexp
by prasadbabu (Prior) on Dec 21, 2005 at 11:07 UTC
|
If i understood your question correctly, this will work.
'.' matches newline character when you use the 's' option modifier in your regex. Also take a look at perlre.
#! /usr/bin/perl
$str = "aaaaa\nbbbbb\nccccc\naaaaa\nddddd\neeeee\n" ;
@a = $str =~ /aaaaa(?:(?!aaaaa).)*/gs ;
print "$a[0]\n$a[1]" ;
updated: removed extra grouping.
| [reply] [d/l] |
|
|
yes, it all makes more sense now, but your regexp is complex... I would like to understand whats going on there with all the ?: Could you add some description of whats going on there ?
Thanks a lot
Luca
| [reply] |
|
|
jeanluca '?:' is used in regex grouping to avoid storing the matched string in the system variables like $1, $2 etc. '?!' is nothing but negative lookahead condition. You take a look at the perlre.
| [reply] |
|
|
|
|
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/(aa
+aaa(?:(?:(?!aaaaa).)*))/s)->explain'
The regular expression:
(?s-imx:(aaaaa(?:(?:(?!aaaaa).)*)))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?s-imx: group, but do not capture (with . matching
\n) (case-sensitive) (with ^ and $ matching
normally) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
aaaaa 'aaaaa'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
aaaaa 'aaaaa'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| [reply] [d/l] [select] |
|
|
|
|
I sat all day, tried to understand prasadbabu's code, then i asked for help to id-perl.
Then someone named Jacinta Richardson told me about this, and she said :
/
aaaaa # Find me aaaaa
(?: # Followed by, but do not capture
(?: # Group but do not capture
(?! # Something which is not
aaaaa
. #and any char including newline
)
)* # As many as possible
)
/gs # Repeat the match, dots can include newlines
The first grouping is unnessary, but not a problem.
Negative look-aheads ask the regular _expression to look at the nextvalue and only include it in the match if it does not match that part of the _expression.
Thus the regular _expression finds: aaaaa\nbbbbb\nccccc\n
in its first run, stopping at the "aaaaa\n" which matches the negative look-ahead and then in its second run finds:
aaaaa\nddddd\neeeee\n
That's what she said, and then i realize that Jacinta Richadson known as jarich here.
Thanks for your time Jarich, and hope this help jeanluca too
| [reply] [d/l] |
|
|
hi
your code's working here and its looks nice (at least for me), tobe honest i need more times to understand your code, i wish you can explain the process of your code (while i am reading my notes about pattern matching)
Anyway i tried other ways, and so far i made a litle code like :
$str = "aa\nbb\ncc\naa\ndd\nee\n";
@a = ($str =~ /aa\n.*\n.*/g);
print "1 = $a[0]\n2 = $a[1]\n";
And the other is :
$str = "aa\nbb\ncc\naa\ndd\nee\n";
@a = ($str =~ /(aa.*).?(aa.*)/gs);
print "1 = $a[0]\n2 = $a[1]\n";
But anyway, prasadbabu code's nicer, that's why i asked for the explanation, or do you see something bad in my code ?
Update : pKai code simpler and easy to understand for me :)
thanks, zak
| [reply] [d/l] [select] |
|
|
Well, this code of yours makes some very specific assumptions about the input string:
- /aa\n.*\n.*/g assumes, that every aa-line is followed by exactly 2 other lines which have to be extracted in addition to the aa-lead.
- /(aa.*).?(aa.*)/gs extracts exactly 2 fields from the string beginning with aa.
The regexes with look-ahead where proposed to cover a wider range of input strings.
| [reply] [d/l] [select] |
Re: multi-line regexp
by Happy-the-monk (Canon) on Dec 21, 2005 at 11:13 UTC
|
Any suggestions ?
See in perldoc perlre on line 30 what the m-switch does.
Without the s-switch, the dot (.) actually doesn't match a newline.
Cheers, Sören
| [reply] |
Re: multi-line regexp
by blazar (Canon) on Dec 21, 2005 at 11:39 UTC
|
Your post is slightly confusing to me, and I suggest trying to be more accurate: e.g. s/muli/multi/, and use <code> (or <c>) tags!
If I get it right, though, you're just confusing /m for /s, which is not uncommon after all. In doubt always check perldoc perlre!
| [reply] [d/l] [select] |
Re: multi-line regexp
by pKai (Priest) on Dec 21, 2005 at 17:56 UTC
|
| [reply] [d/l] |
Re: multi-line regexp
by GrandFather (Saint) on Dec 21, 2005 at 19:50 UTC
|
This uses a minimum capture .*? and positive look ahead (?=...) with a conditional match aaaaa|$ to do the job:
#! /usr/bin/perl
use strict;
use warnings;
my $str = "aaaaa\nbbbbb\nccccc\naaaaa\nddddd\neeeee\n" ;
my @a = $str =~ /(aaaaa.*?)(?=aaaaa|$)/gs ;
print join "\n", @a;
Prints:
aaaaa
bbbbb
ccccc
aaaaa
ddddd
eeeee
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
|
|
my @a = $str =~ /(aaaaa.*?)(?=aaaaa|$)/gs ;
and
my @a = $str =~ /aaaaa.*?(?=aaaaa|$)/gs ;
right ?
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|