in reply to Re: Extract first word from certain lines in a file
in thread Extract first word from certain lines in a file
For a common suffix routine the regex engine's brute force approach comes in handy:
It's not efficient for large strings but I think it's neat anyway. :-)sub is_suffix { substr($_[0], -length($_[1])) eq $_[1] } sub common_suffix { $_[0] =~ m[(?>(.+))(?(?{ not is_suffix($_[1], $^N) })(?!))]s; return $1; }
Obvious optimizations can be done, such as reorder the arguments so that the string matched against ($_[0]) is the shorter string, or shrink the longer string, but I didn't want to clutter the essence of the routine.
To get the common suffix of a list, use &reduce from List::Util:
use List::Util qw/ reduce /; print reduce { common_suffix($a, $b) } qw/ redball greenball stall /; __END__ all
As the regex might be a bit cryptic, here's how it works:
A bit reformatted, it looks like this.
First it matches the whole first string. Then it sees if that can be found at the end of the second string. If it can, the match was successful and the match is returned. If it can't, it's forced to fail. That failure results in the engine trying again, this time moving one character into the first string. This is repeated until there's a match, which there always is. The least possible common suffix is the empty string.sub common_suffix { $_[0] =~ m[ (?> (.+) ) (?(?{ not is_suffix($_[1], $^N) }) (?!) ) ]xs; return $1; }
There's some special assertions in the pattern. The first is (?>). It's the non-backtracking subpattern assertion. The pattern inside it can backtrack, but once a match inside it is found, it's "fixed". If the next part of the pattern (outside (?>)) fails, all of (?>) fail.
When the first part of the pattern has matched the rest of the string, the regex version of COND ? TRUE : FALSE, (?(COND)TRUE|FALSE) is utilized to see if the match can be found at the end of the other string. This is done by a code assertion, (?{}). If the code assertion evaluates to true the TRUE subexpression inside it will be evaulated. The FALSE subexpression is optional and if not specified and the condition evaluates to false the assertion will match/succeed, i.e. (?(COND)TRUE) is the same as (?(COND)TRUE|) and an empty pattern always matches.
In the conditional I use $^N which is the value of most recently closed capturing group, which is $1 in my pattern.
The subpattern inside the conditional match is (?!) which is the negative lookahead assertion. I don't have anything inside the assertion, so the subpattern in it (the empty pattern) always matches, so the assertion fails. This is a way to effectively force the engine to backtrack.
That's all.
Related documents:
| • perlvar | - Perl predefined variables |
| • perlreref | - Perl regular expressions reference |
| • perlre | - Perl regular expressions |
ihb
See perltoc if you don't know which perldoc to read!
Read argumentation in its context!
|
|---|