jithoosin has asked for the wisdom of the Perl Monks concerning the following question:
Hi monks
Could any of the esteemed monks could explain this regex.
This regex is used to get swf files from html pages. The regex is m/\.swf(?:\?.*)?$/oi. I am not understanding the meaning thing inside the brackets. Why is "?" used in the regex. After some searching in google i feel "?:" is optional kind of thing. Thanks in advance.
Regards
Kiran.
Re: questions of a perl beginner on regex
by prasadbabu (Prior) on Jun 20, 2006 at 12:10 UTC
|
Hi jithoosin, take a look at perlre. '?:' does not store the content in the memory. '?' is optional (either present or not) of the preceding content. To match normal '?', special character '?' has to be escaped by backslash '\?'. Here is explanation for that.
use YAPE::Regex::Explain;
$REx = 'm/\.swf(?:\?.*)?$/oi';
my $exp = YAPE::Regex::Explain->new($REx)->explain;
print $exp;
output:
-------
The regular expression:
(?-imsx:m/\.swf(?:\?.*)?$/oi)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
m/ 'm/'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
swf 'swf'
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\? '?'
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
/oi '/oi'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| [reply] [d/l] |
Re: questions of a perl beginner on regex
by jhourcle (Prior) on Jun 20, 2006 at 12:15 UTC
|
\. # the character '.'
swf # the string, 'swf'
(?: ... ) # don't save this expression
\? # the character '?'
.* # zero or more characters (not a newline)
? # zero or one occurances (of the grouping in parens)
$ # end of the line
/oi # don't re-evaluate, case insensitive.
So, as best I can tell, you're not only grabbing URLs that end in SWF, you're also grabbing URLs that are passing a query string (ie, URL?QUERY_STRING format) | [reply] [d/l] |
Re: questions of a perl beginner on regex
by liverpole (Monsignor) on Jun 20, 2006 at 12:13 UTC
|
Hi jithoosin,
It depends on which "?" you mean.
The typical use of "?" in a regex is "zero or one" of the preceding thing.Of course, another way to say "zero or one", is "optional". So, for example, in "(?:\?.*)?", the final "?" means that "(?:\?.*)" is optional.
The second "?" (in "\?") stands for a single "?" (since it's escaped with "\").
However, the first "?":  :"(?:...)" has a different meaning, which is that it turns off the "capture" that would usually happen with "(...)".
You should read more about perl regular expressions to get the full scoop.
s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
| [reply] |
Re: questions of a perl beginner on regex
by wazzuteke (Hermit) on Jun 20, 2006 at 12:23 UTC
|
All the above comments have it pretty right on. The fun ? has a number of hidden meanings and can certainly be a little confusing. Overall, the ? can be very powerful when used in the right context, given any particular problem. May I recommend taking a look at the Perl regex documentation; it had proved to be a pretty good resource for questions just as these
http://www.perl.com/doc/manual/html/pod/perlre.html
Have a good one!
print map{chr}(45,45,104,97,124,124,116,97,45,45);
| [reply] [d/l] [select] |
Re: questions of a perl beginner on regex
by derby (Abbot) on Jun 20, 2006 at 12:24 UTC
|
Others have answered your question. Basically you have three different types of question marks: ?:, \? and ? - non capturing matching, escaping, and zero or more matches. But looking at that regex, I wonder if it's all just overkill. It looks like your matching a flash url that may have paramaters. I wonder if this simpler regex will do the job as well:
m/\.swf.*$/oi
Comments? I know some people wish death to dot-star but what's the downside here?
| [reply] [d/l] |
|
Maybe, maybe not. I think it's overkill, but your proposal is too simple. It might cause false matches if there were a 'www.swf.com' (or even 'www.swfa.com')
I'd probably use something like:
m/[.]swf(?:$|[?])/i
if we're not capturing, there's no reason to need '.*' to get the rest of the line, so a simple alternate (end of line, or a question mark), will work just fine.
It's almost as few characters to use the following, and there would hopefully be less confusion about what it's doing for novices:
m/\.swf$|\.swf\?/i
| [reply] [d/l] [select] |
|
|