comment on

Hello bsherkhane,

The escape sequences \1, \2, \3, etc., are backreferences to captures in the current regex. The special variables $1, $2, $3, etc., are likewise backreferences to the captures in the most recent regex. $1 refers to the first capture, $2 to the second capture, and so on. Captures are numbered by counting left parentheses from the left. See perlre#Capture-groups.

The module YAPE::Regex::Explain is a useful tool for understanding regular expressions. Here is the explanation it gives for the left-hand side (i.e., the regex part) of the substitution in question:

#! perl
use strict;
use warnings;
use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new
(
    qr{ \b ( (\d+) \s \S+ ) (.*?) \s \2 \s (\S+) }x

)->explain();
[download]

Output:

17:26 >perl 1526_SoPW.pl
The regular expression:

(?x-ims: \b ( (\d+) \s \S+ ) (.*?) \s \2 \s (\S+) )

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?x-ims:                 group, but do not capture (disregarding
                         whitespace and comments) (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  \2                       what was matched by capture \2
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

17:26 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

In reply to Re^2: print identical keys once along with their values by Athanasius
in thread print identical keys once along with their values by bsherkhane

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.