comment on

Your problem is that a match succeeds if what you're matching for is found in the string. That sounds like an obvious statement, but think about the ramifications. The following will result in a match:

my $string = "abcdefDaveghijkl";
if ( $string =~ /Dave/ ) {
    print "I found Dave.\n";
}
[download]

It matches because Dave is found within the string you're searching.

Now to look at your situation with the following example:

my $string = "1.23.45";
print "Found a number.\n" if /\d+/;
[download]

A number is, of course, found within that string. The only real qualification you've given is that the match has to contain at least one and possibly more numeric digits. '1' qualifies.

Now to show an example closer to your issue.

my $string = "1.234.567.txt";
print "Found a good filename.\n" if /\d+\.txt/;
[download]

You've now told the regexp engine to return true if anywhere in the string, it finds one or more digits followed by a period (or dot, or decimal), followed by the literal letters 'txt'. Given that description, the 567.txt portion of the string triggers a match, and you've done nothing to prevent the regexp engine from accepting the match. Just as /Dave/ can be found in part of a string, 567.txt can as well. For example:

my $string = "1.234.567.txt";
print "Match.\n" if /567.txt/;
[download]

Well, 567.txt does exist within $string, doesn't it? So of course it matches. \d+.txt is not so different, except that it will match any number, not just 567.

What you really need to do is tell the regexp engine that your filename has to contain only numbers preceeding the '.txt' extension. As with anything in Perl, there is more than one way to do it. The simplest way is to anchor the match to the beginning of the string.

/^\d+\.txt/
[download]

But that would also match '4567.txtish', or even '5678.txt.90210', because you've left the door open for trailing stuff. If there is any chance of some 'whitespace' characters being padded at the beginning or end of the string, you should take that into consideration, and also should prevent unwanted characters past the '.txt' extension by anchoring at the end of the string as well:

my $testname = "     456.txt";
my $filename;
if ($filename = $testname=~ /^\s*?(\d+\.txt)\s*?$/ ) {
   print "Found a file named $filename.\n";
}
[download]

Now you've gotten a lot more robust. The preceeding regexp will accept only filenames that contain nnnn.txt (where nnnn is any number of numeric digits). If the filename is preceeded or followed by whitespace, that whitespace is permitted but ignored, and you're anchored to both the beginning and the end of the string. That prevents a filename like 1234.txtish from being matched. Though probably unnecessary in this case, I also forced non-greed in my matches that absorbed whitespace. The reason I did this wasn't necessarily to make YOUR regexp more robust, but to reinforce a good habbit that makes most regexp's more robust. In fact, in Larry Wall's description of regexp's in Perl 6, he discusses the fact that non-greedy matching ought to be the default, and greedy matching ought to be the exception in Perl 6 regular expressions. Non-greedy matches are more than often, what you're looking for. And finally, this match assigns the portion that you consider to be a filename, to the scalar variable $filename.

Hope this helps!

Dave

"If I had my life to do over again, I'd be a plumber." -- Albert Einstein

In reply to Re: regex matching number by davido
in thread regex matching number by InfiniteSilence

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.