Note that the . character matches any character but a newline (see m// if you want to span lines), the * means match zero or more times, and the ? forces * to match as few times as possible -- so it will pick up the first end instead of the last one. The \b is in there to prevent mismatches on words like 'starting' and 'backend'.if ($text =~ /\bstart\b(.*?)\bend\b/) { $result = $1; # do something with results }
It has the limitation of not catching nested starts and ends, in which case you might go the recursion route, and write this as a function:
That can become prohibitively expensive, depending on your data set. I suspect there's a more hideous solution involving split and join, but that's likely to be counterproductive at this point. It also depends on having balanced tags -- if you don't, don't do this!sub between { my $text = shift; if ($text =~ /start(.*?)end/) { $result = $1; between($result); } else { return $text; } }
In reply to Re: How do I extract all text between two keywords like start and end?
by chromatic
in thread How do I extract all text between two keywords like start and end?
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |