substitute instead global matching for nested matches

utku has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, looking for Wisdom about nested matches. I have following string:

$fc = '#4 
r0 !
r1 !
r2 !
#5
r3 !
r4 !
r5 !
';
[download]

What I want to obtain is:

#4
r2 !
#5 
r5 !
[download]

.. which means that foreach (#\d+) and ((\S+\s+\S+\s+)*) match, I need to take first item and then last item of the 2nd and nested match. Following code works:

while ($fc =~ /#(\d+)\s+(\S+\s+\S+\s+)*)/g ) {
    $time = $1; $list = $2;
    @rlist = split /\n/, $list;
    $ofc .= "#$time\n$rlist[$#rlist]\n";
}
[download]

.. but what I don't like is that I need to store the new string into a second var like "$ofc". Is there a way to remedy 2nd var and thus to reduce this loop into one single substitution?: $fc =~ s/(#\d+)\s+((\S+\s+\S+\s+)*)/$1$&/g; ... which doesn't do the job I want. Holy monks, do I want too much? Could substitue s///g be faster than global matching for this case?

Comment on substitute instead global matching for nested matches Select or Download Code

Replies are listed 'Best First'.
Re: substitute instead global matching for nested matches by GrandFather (Saint) on Mar 18, 2012 at 22:51 UTC
Think lines and grab just those you want. Consider: `use strict; use warnings; my $fc = <<FC; #4 r0 ! r1 ! r2 ! #5 r3 ! r4 ! r5 ! FC print "$1\n$2\n" while $fc =~ /^(#\d+).?^.?^(.*?)$/gms;` [download] Prints: `#4 r0 ! #5 r3 !` [download] Note the regex switches - g to match multiple times, s to let . match new line characters, and m to perform a multi-line match which allows the internal ^ and $ matches to work. True laziness is hard work	[reply] [d/l] [select]
Re^2: substitute instead global matching for nested matches by utku (Acolyte) on Mar 18, 2012 at 23:06 UTC
This returns the first element in the 2nd match. I need to get the last element. So the result must be `#4 r2 ! #5 r5 !` [download] And your code is restricted to 3 consecutive elements separated by lines. I need to a more generic solution because my lists are changing variably. Sorry did not mention that clearly in my previous post (the global match ((\S+\s+\S+\s+)) implies that). So a $fc could be `#4 r1 ! #5 r3 ! r7 ! r10 ! #8 r0 ! r1 !` [download] I need to get the last* r member. Any help appreciated.	[reply] [d/l] [select]
Re^3: substitute instead global matching for nested matches by GrandFather (Saint) on Mar 18, 2012 at 23:19 UTC
The following meets your current spec as I understand it: `use strict; use warnings; my $fc = <<FC; #4 r2 ! #5 r3 ! r7 ! r10 ! #8 r0 ! r1 ! FC print "$1\n$2\n" while $fc =~ /^(#\d+)[^#]*^([^#\n]+)$/gm;` [download] Prints: `#4 r2 ! #5 r10 ! #8 r1 !` [download] True laziness is hard work	[reply] [d/l] [select]
Re^4: substitute instead global matching for nested matches by utku (Acolyte) on Mar 18, 2012 at 23:27 UTC
Re: substitute instead global matching for nested matches by wrog (Friar) on Mar 18, 2012 at 23:30 UTC
First of all, I had to change that first line to `while ($fc =~ /#(\d+)\s+(([^#\s]\S\s+\S+\s+))/g ) {` to get it to work the way you claimed. Otherwise that first match just grabs everything up to `r5`. I'm also still unclear on what sorts of things are expected on input, e.g., why is the split call using `\n` when the first regexp suggests that you're expecting things to be separated by general whitespace `\s+` which may or may not have newlines in it (or have multiple newlines in it)? E.g., it's possible that `join '', map {m/(\d)\s+(?:\S+\s+\S+\s+)(\S+\s+\S+\s+)$/ ? "#$1\n$2" : ()} split /^#/m, $fc` is what you want, but I can't tell for sure.	[reply] [d/l] [select]
Re^2: substitute instead global matching for nested matches by utku (Acolyte) on Mar 20, 2012 at 13:56 UTC
Thanks for the first regexp fix, your improvement supports more input variations in some way. The second one is also handy but regexp is long, ie. you use match pattern \S+\s+\S+\s+ twice. I had thought it is reducible to one matcher. However it does its job also. I really appreciate your help it has given me new ideas, especially with the first regexp tip.	[reply]