in reply to multi-line regexp

If i understood your question correctly, this will work.

'.' matches newline character when you use the 's' option modifier in your regex. Also take a look at perlre.

#! /usr/bin/perl $str = "aaaaa\nbbbbb\nccccc\naaaaa\nddddd\neeeee\n" ; @a = $str =~ /aaaaa(?:(?!aaaaa).)*/gs ; print "$a[0]\n$a[1]" ;

updated: removed extra grouping.

Thanks in advance

Prasad

Replies are listed 'Best First'.
Re^2: multi-line regexp
by jeanluca (Deacon) on Dec 21, 2005 at 11:34 UTC
    yes, it all makes more sense now, but your regexp is complex...
    I would like to understand whats going on there with all the ?: Could you add some description of whats going on there ?

    Thanks a lot
    Luca
      jeanluca '?:' is used in regex grouping to avoid storing the matched string in the system variables like $1, $2 etc. '?!' is nothing but negative lookahead condition. You take a look at the perlre.

      Thanks in advance

      Prasad

        Right! Except that I wouldn't call them "system variables". How 'bout "numbered match variables" instead?
      maybe this helps:
      perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/(aa +aaa(?:(?:(?!aaaaa).)*))/s)->explain'
      The regular expression: (?s-imx:(aaaaa(?:(?:(?!aaaaa).)*))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?s-imx: group, but do not capture (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- aaaaa 'aaaaa' ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- (?: group, but do not capture (0 or more times (matching the most amount possible)): ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- aaaaa 'aaaaa' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- . any character ---------------------------------------------------------------------- )* end of grouping ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

        hi, do we need 'g' modifier here ?, i tried your pattern matching with this code :

        $str = "aaaaa\nbbbbb\nccccc\nddddd\naaaaa\nddddd\neeeee\n"; @a = $str =~ /(aaaaa(?:(?:(?!aaaaa).)*))/s; print "1 : $a[0]\n 2 : $a[1]\n";

        and this code only match "aaaaa bbbbb ccccc ddddd" ($a[0])

        and if we put g in pattern matching, i mean like this :

        $str = "aaaaa\nbbbbb\nccccc\nddddd\naaaaa\nddddd\neeeee\n"; @a = $str =~ /(aaaaa(?:(?:(?!aaaaa).)*))/gs; print "1 : $a[0]\n 2 : $a[1]\n";

        this will print out : "aaaaa bbbbb ccccc ddddd" and "aaaaa ddddd eeeee".

        thanks, zak

      I sat all day, tried to understand prasadbabu's code, then i asked for help to id-perl.

      Then someone named Jacinta Richardson told me about this, and she said :

      / aaaaa # Find me aaaaa (?: # Followed by, but do not capture (?: # Group but do not capture (?! # Something which is not aaaaa . #and any char including newline ) )* # As many as possible ) /gs # Repeat the match, dots can include newlines

      The first grouping is unnessary, but not a problem.

      Negative look-aheads ask the regular _expression to look at the nextvalue and only include it in the match if it does not match that part of the _expression.

      Thus the regular _expression finds: aaaaa\nbbbbb\nccccc\n

      in its first run, stopping at the "aaaaa\n" which matches the negative look-ahead and then in its second run finds: aaaaa\nddddd\neeeee\n

      That's what she said, and then i realize that Jacinta Richadson known as jarich here.

      Thanks for your time Jarich, and hope this help jeanluca too

Re^2: multi-line regexp
by doctor_moron (Scribe) on Dec 21, 2005 at 17:38 UTC
    hi

    your code's working here and its looks nice (at least for me), tobe honest i need more times to understand your code, i wish you can explain the process of your code (while i am reading my notes about pattern matching)

    Anyway i tried other ways, and so far i made a litle code like :

    $str = "aa\nbb\ncc\naa\ndd\nee\n"; @a = ($str =~ /aa\n.*\n.*/g); print "1 = $a[0]\n2 = $a[1]\n";
    And the other is :
    $str = "aa\nbb\ncc\naa\ndd\nee\n"; @a = ($str =~ /(aa.*).?(aa.*)/gs); print "1 = $a[0]\n2 = $a[1]\n";

    But anyway, prasadbabu code's nicer, that's why i asked for the explanation, or do you see something bad in my code ?

    Update : pKai code simpler and easy to understand for me :)

    thanks, zak

      Well, this code of yours makes some very specific assumptions about the input string:
      • /aa\n.*\n.*/g assumes, that every aa-line is followed by exactly 2 other lines which have to be extracted in addition to the aa-lead.
      • /(aa.*).?(aa.*)/gs extracts exactly 2 fields from the string beginning with aa.
      The regexes with look-ahead where proposed to cover a wider range of input strings.