Re: multi-line regexp

Replies are listed 'Best First'.
Re^2: multi-line regexp by jeanluca (Deacon) on Dec 21, 2005 at 11:34 UTC
yes, it all makes more sense now, but your regexp is complex... I would like to understand whats going on there with all the ?: Could you add some description of whats going on there ? Thanks a lot Luca	[reply]
Re^3: multi-line regexp by prasadbabu (Prior) on Dec 21, 2005 at 11:40 UTC
jeanluca '?:' is used in regex grouping to avoid storing the matched string in the system variables like $1, $2 etc. '?!' is nothing but negative lookahead condition. You take a look at the perlre. Thanks in advance Prasad	[reply]
Re^4: multi-line regexp by blazar (Canon) on Dec 22, 2005 at 09:38 UTC
Right! Except that I wouldn't call them "system variables". How 'bout "numbered match variables" instead?	[reply]
Re^3: multi-line regexp by l3v3l (Monk) on Dec 21, 2005 at 17:42 UTC
maybe this helps: `perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/(aa +aaa(?:(?:(?!aaaaa).)))/s)->explain'` [download] The regular expression: (?s-imx:(aaaaa(?:(?:(?!aaaaa).)))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?s-imx: group, but do not capture (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- aaaaa 'aaaaa' ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- (?: group, but do not capture (0 or more times (matching the most amount possible)): ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- aaaaa 'aaaaa' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- . any character ---------------------------------------------------------------------- )* end of grouping ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l] [select]
Re^4: multi-line regexp by doctor_moron (Scribe) on Dec 23, 2005 at 11:23 UTC
hi, do we need 'g' modifier here ?, i tried your pattern matching with this code : `$str = "aaaaa\nbbbbb\nccccc\nddddd\naaaaa\nddddd\neeeee\n"; @a = $str =~ /(aaaaa(?:(?:(?!aaaaa).)))/s; print "1 : $a[0]\n 2 : $a[1]\n";` [download] and this code only match "aaaaa bbbbb ccccc ddddd" ($a[0]) and if we put g in pattern matching, i mean like this : `$str = "aaaaa\nbbbbb\nccccc\nddddd\naaaaa\nddddd\neeeee\n"; @a = $str =~ /(aaaaa(?:(?:(?!aaaaa).)))/gs; print "1 : $a[0]\n 2 : $a[1]\n";` [download] this will print out : "aaaaa bbbbb ccccc ddddd" and "aaaaa ddddd eeeee". thanks, zak	[reply] [d/l] [select]
Re^3: multi-line regexp by doctor_moron (Scribe) on Dec 23, 2005 at 11:06 UTC
I sat all day, tried to understand prasadbabu's code, then i asked for help to id-perl. Then someone named Jacinta Richardson told me about this, and she said : `/ aaaaa # Find me aaaaa (?: # Followed by, but do not capture (?: # Group but do not capture (?! # Something which is not aaaaa . #and any char including newline ) )* # As many as possible ) /gs # Repeat the match, dots can include newlines` [download] The first grouping is unnessary, but not a problem. Negative look-aheads ask the regular _expression to look at the nextvalue and only include it in the match if it does not match that part of the _expression. Thus the regular _expression finds: aaaaa\nbbbbb\nccccc\n in its first run, stopping at the "aaaaa\n" which matches the negative look-ahead and then in its second run finds: aaaaa\nddddd\neeeee\n That's what she said, and then i realize that Jacinta Richadson known as jarich here. Thanks for your time Jarich, and hope this help jeanluca too	[reply] [d/l]
Re^2: multi-line regexp by doctor_moron (Scribe) on Dec 21, 2005 at 17:38 UTC
hi your code's working here and its looks nice (at least for me), tobe honest i need more times to understand your code, i wish you can explain the process of your code (while i am reading my notes about pattern matching) Anyway i tried other ways, and so far i made a litle code like : `$str = "aa\nbb\ncc\naa\ndd\nee\n"; @a = ($str =~ /aa\n.\n./g); print "1 = $a[0]\n2 = $a[1]\n";` [download] And the other is : `$str = "aa\nbb\ncc\naa\ndd\nee\n"; @a = ($str =~ /(aa.).?(aa.)/gs); print "1 = $a[0]\n2 = $a[1]\n";` [download] But anyway, prasadbabu code's nicer, that's why i asked for the explanation, or do you see something bad in my code ? Update : pKai code simpler and easy to understand for me :) thanks, zak	[reply] [d/l] [select]
Re^3: multi-line regexp by pKai (Priest) on Dec 21, 2005 at 18:25 UTC
Well, this code of yours makes some very specific assumptions about the input string: `/aa\n.\n./g` assumes, that every aa-line is followed by exactly 2 other lines which have to be extracted in addition to the aa-lead. `/(aa.).?(aa.)/gs` extracts exactly 2 fields from the string beginning with aa. The regexes with look-ahead where proposed to cover a wider range of input strings.	[reply] [d/l] [select]