comment on

Use <code> and </code> around your code to make it easier to read when posting. When you show the output of a print statement, just show the output, not what say $1 or $2 is (ie. not $1= I have) - just show the output of the print statement.

Do it like this:

$_ = "I have 2 numbers: 53147"; 
if (/(.*?)(\d+)/) 
{ 
    print "Beginning is <$1>,number is <$2>.\n"; 
}

#prints: Beginning is <I have >,number is <2>.
#  Don't tell me that $1 = "I have ".
#  Just execute the print statement and show the output.
[download]

First, it is almost always a bad idea to assign to $_ explicitly. I recommend against doing that. Better is:

use warnings;
use strict;

my $string = "I have 2 numbers: 53147"; 
if ($string =~ /(.*?)(\d+)/) 
{ 
    print "Beginning is <$1>,number is <$2>.\n"; 
}

#prints: Beginning is <I have >,number is <2>.
[download]

What you have here is what is called a "regular expression" or "regex". m/(.*?)(\d+)/ (m or match is implicit). This regex means that we are going to match the minimum span of any characters (may, by the way even be zero characters) that still allows the next regex term to "match" if it is possible to do so.

So basically, "(.*?)" means all characters up to but not including the first digit seen - the shortest string that doesn't include the first digit - note: this does include the space before the first digit seen. "(\d+)" means now that we have seen a digit, get me all digits that are sequential. This is how you get "I have " and then "2" for $1 and $2 respectively.

You should experiment when faced with a regex like this. Change the string to be say: "I have 6718 numbers: 53147" and see what that prints. It will print: Beginning is <I have >,number is <6718>. "2" has now become "6718", just like the previous paragraph would lead you to believe would happen.

Now, lets experiment more. That ? in the first capture term matters a lot! The ? "minimizes" the length of the match. Let's say that we have (no ? character):

my $string = "I have 2 numbers: 53147"; 
if ($string =~ /(.*)(\d+)/) 
{ 
    print "Beginning is <$1>,number is <$2>.\n"; 
}
#prints: Beginning is <I have 2 numbers: 5314>,number is <7>.
[download]

That (.*) means: give me the maximal length string while still allowing (\d+) to match. Working from the right, "7" is the shortest thing that matches "one or more digits"(\d+) and sure enough (.*) matches everything in front of that. (.*) matches the longest thing that still allows (\d+) to match, albeit with just a single digit!

Let's say that you knew that that were two numbers (sequences of digits) in this string.

my $string = "I have 325 numbers: 98765  12324";  
if ($string =~ /(\d+)\D+(\d+)/) 
{ 
    print "Beginning is <$1>,number is <$2>.\n"; 
}
#prints: Beginning is <325>,number is <98765>.
[download]

The regex says: capture the first sequence of digits, ignore a sequence of one or more non-digits and then capture the next sequence of digits.

This whole business of regex can become VERY complicated. The classic book on this is: Mastering Regular Expressions by Jeffrey Friedl. Fortunately, the vast majority of regex's don't require anywhere near the knowledge required to understand Friedl's book!

In Perl:

\d, a digit[0-9]                    \D a non digit
\w, a word character[a-zA-Z0-9_]    \W a non word character
\s, a white space char [\s\t\f\r\n] \S a non-whitespace char
[download]

is normally all you need to know along with some simple rules about minimal and maximal matches.

In reply to Re^3: What is the output for this ?? by Marshall
in thread What is the output for this ?? by sreenath

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.