utku has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, looking for Wisdom about nested matches. I have following string:
$fc = '#4 r0 ! r1 ! r2 ! #5 r3 ! r4 ! r5 ! ';
What I want to obtain is:
#4 r2 ! #5 r5 !
.. which means that foreach (#\d+) and ((\S+\s+\S+\s+)*) match, I need to take first item and then last item of the 2nd and nested match. Following code works:
while ($fc =~ /#(\d+)\s+(\S+\s+\S+\s+)*)/g ) { $time = $1; $list = $2; @rlist = split /\n/, $list; $ofc .= "#$time\n$rlist[$#rlist]\n"; }
.. but what I don't like is that I need to store the new string into a second var like "$ofc". Is there a way to remedy 2nd var and thus to reduce this loop into one single substitution?: $fc =~ s/(#\d+)\s+((\S+\s+\S+\s+)*)/$1$&/g; ... which doesn't do the job I want. Holy monks, do I want too much? Could substitue s///g be faster than global matching for this case?

Replies are listed 'Best First'.
Re: substitute instead global matching for nested matches
by GrandFather (Saint) on Mar 18, 2012 at 22:51 UTC

    Think lines and grab just those you want. Consider:

    use strict; use warnings; my $fc = <<FC; #4 r0 ! r1 ! r2 ! #5 r3 ! r4 ! r5 ! FC print "$1\n$2\n" while $fc =~ /^(#\d+).*?^.*?^(.*?)$/gms;

    Prints:

    #4 r0 ! #5 r3 !

    Note the regex switches - g to match multiple times, s to let . match new line characters, and m to perform a multi-line match which allows the internal ^ and $ matches to work.

    True laziness is hard work

      This returns the first element in the 2nd match. I need to get the last element. So the result must be

      #4 r2 ! #5 r5 !

      And your code is restricted to 3 consecutive elements separated by lines. I need to a more generic solution because my lists are changing variably. Sorry did not mention that clearly in my previous post (the global match ((\S+\s+\S+\s+)*) implies that). So a $fc could be

      #4 r1 ! #5 r3 ! r7 ! r10 ! #8 r0 ! r1 !

      I need to get the last r member. Any help appreciated.

        The following meets your current spec as I understand it:

        use strict; use warnings; my $fc = <<FC; #4 r2 ! #5 r3 ! r7 ! r10 ! #8 r0 ! r1 ! FC print "$1\n$2\n" while $fc =~ /^(#\d+)[^#]*^([^#\n]+)$/gm;

        Prints:

        #4 r2 ! #5 r10 ! #8 r1 !
        True laziness is hard work
Re: substitute instead global matching for nested matches
by wrog (Friar) on Mar 18, 2012 at 23:30 UTC
    First of all, I had to change that first line to
    while ($fc =~ /#(\d+)\s+(([^#\s]\S*\s+\S+\s+)*)/g ) {
    to get it to work the way you claimed. Otherwise that first match just grabs everything up to r5.

    I'm also still unclear on what sorts of things are expected on input, e.g., why is the split call using \n when the first regexp suggests that you're expecting things to be separated by general whitespace \s+ which may or may not have newlines in it (or have multiple newlines in it)?

    E.g., it's possible that

    join '', map {m/(\d*)\s+(?:\S+\s+\S+\s+)*(\S+\s+\S+\s+)$/ ? "#$1\n$2" : ()} split /^#/m, $fc
    is what you want, but I can't tell for sure.
      Thanks for the first regexp fix, your improvement supports more input variations in some way. The second one is also handy but regexp is long, ie. you use match pattern \S+\s+\S+\s+ twice. I had thought it is reducible to one matcher. However it does its job also. I really appreciate your help it has given me new ideas, especially with the first regexp tip.