comment on

For a common suffix routine the regex engine's brute force approach comes in handy:

sub is_suffix { substr($_[0], -length($_[1])) eq $_[1] }

sub common_suffix {
    $_[0] =~ m[(?>(.+))(?(?{ not is_suffix($_[1], $^N) })(?!))]s;
    return $1;
}
[download]

It's not efficient for large strings but I think it's neat anyway. :-)

Obvious optimizations can be done, such as reorder the arguments so that the string matched against ($_[0]) is the shorter string, or shrink the longer string, but I didn't want to clutter the essence of the routine.

To get the common suffix of a list, use &reduce from List::Util:

use List::Util qw/ reduce /;

print reduce { common_suffix($a, $b) } qw/ redball greenball stall /;

__END__
all
[download]

As the regex might be a bit cryptic, here's how it works:

A bit reformatted, it looks like this.

sub common_suffix {
    $_[0] =~ m[
        (?>
            (.+)
        )
        (?(?{ not is_suffix($_[1], $^N) })
            (?!)
        )
    ]xs;
    return $1;
}
[download]

First it matches the whole first string. Then it sees if that can be found at the end of the second string. If it can, the match was successful and the match is returned. If it can't, it's forced to fail. That failure results in the engine trying again, this time moving one character into the first string. This is repeated until there's a match, which there always is. The least possible common suffix is the empty string.

There's some special assertions in the pattern. The first is (?>). It's the non-backtracking subpattern assertion. The pattern inside it can backtrack, but once a match inside it is found, it's "fixed". If the next part of the pattern (outside (?>)) fails, all of (?>) fail.

When the first part of the pattern has matched the rest of the string, the regex version of COND ? TRUE : FALSE, (?(COND)TRUE|FALSE) is utilized to see if the match can be found at the end of the other string. This is done by a code assertion, (?{}). If the code assertion evaluates to true the TRUE subexpression inside it will be evaulated. The FALSE subexpression is optional and if not specified and the condition evaluates to false the assertion will match/succeed, i.e. (?(COND)TRUE) is the same as (?(COND)TRUE|) and an empty pattern always matches.

In the conditional I use $^N which is the value of most recently closed capturing group, which is $1 in my pattern.

The subpattern inside the conditional match is (?!) which is the negative lookahead assertion. I don't have anything inside the assertion, so the subpattern in it (the empty pattern) always matches, so the assertion fails. This is a way to effectively force the engine to backtrack.

That's all.

Related documents:

• perlvar - Perl predefined variables

• perlreref - Perl regular expressions reference

• perlre - Perl regular expressions

ihb

See perltoc if you don't know which perldoc to read!
Read argumentation in its context!

In reply to Re^2: Extract first word from certain lines in a file by ihb
in thread Extract first word from certain lines in a file by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.

• perlvar	- Perl predefined variables
• perlreref	- Perl regular expressions reference
• perlre	- Perl regular expressions