comment on

In general, map blocks can be rewritten like the following (I'm simplifying a little bit), although one would usually use a lexical loop variable like for my $elem (@in) ... instead of $_.

my @out = map { ... } @in;
# - becomes -
my @out;
for $_ (@in) {
    my @result = ...;
    push @out, @result;
}
[download]

The regex /^(\S+)\s++(.+)/ is splitting the input string on the first whitespace (it is equivalent to my ($left,$right) = split /\s+/, $str, 2;, see split). Using the ternary ?: operator, if the regex matches, the block of code will return the string "(?:$1(?{'$2'}))", and if it doesn't match, die is called. So in this case the map operation is not returning a hash (or a list of key-value pairs), but just one output string for each input string, the input strings being one line of the regexes each.

So with the join '|', ..., as you said the code is constructing a single regex. The general process of doing so is something I discussed in my tutorial Building Regex Alternations Dynamically, but this one is a bit more specialized. For the names, tybalt89 is using a neat trick using (?{...}), which allows you to insert arbitrary code into a regular expression, the return value of the most recent code is then stored in the special variable $^R. The use re 'eval'; is necessary because these (?{}) blocks are being interpolated from strings into the regex, so this is a security feature of Perl.

Consider this regex (I'm using the /x modifier for readability): m{ ^[a-zA-Z]\w+$ (?{'one'}) | ^[0-9]\w+$ (?{'two'}) }x. When matching against the string "3abc", it will match the second alternation, that is, ^[0-9]\w+$, and then it will execute (?{'two'}), and since the last value in that piece of code is 'two', that is what it returns and what $^R gets set to. After the regex has executed and matched successfully, you can simply look at the value of $^R to see which of the two patterns contained in the regex were matched.

Minor edits for clarity.

In reply to Re^3: Pattern Identification by haukex
in thread Pattern Identification by WhiteTraveller

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.