Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

questions of a perl beginner on regex

by jithoosin (Scribe)
on Jun 20, 2006 at 11:58 UTC ( #556354=perlquestion: print w/replies, xml ) Need Help??

jithoosin has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks

Could any of the esteemed monks could explain this regex. This regex is used to get swf files from html pages.
The regex is m/\.swf(?:\?.*)?$/oi. I am not understanding the meaning thing inside the brackets. Why is "?" used in the regex. After some searching in google i feel "?:" is optional kind of thing. Thanks in advance.


Replies are listed 'Best First'.
Re: questions of a perl beginner on regex
by prasadbabu (Prior) on Jun 20, 2006 at 12:10 UTC

    Hi jithoosin, take a look at perlre. '?:' does not store the content in the memory. '?' is optional (either present or not) of the preceding content. To match normal '?', special character '?' has to be escaped by backslash '\?'. Here is explanation for that.

    use YAPE::Regex::Explain; $REx = 'm/\.swf(?:\?.*)?$/oi'; my $exp = YAPE::Regex::Explain->new($REx)->explain; print $exp; output: ------- The regular expression: (?-imsx:m/\.swf(?:\?.*)?$/oi) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- m/ 'm/' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- swf 'swf' ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- \? '?' ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- /oi '/oi' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------


Re: questions of a perl beginner on regex
by jhourcle (Prior) on Jun 20, 2006 at 12:15 UTC
    \. # the character '.' swf # the string, 'swf' (?: ... ) # don't save this expression \? # the character '?' .* # zero or more characters (not a newline) ? # zero or one occurances (of the grouping in parens) $ # end of the line /oi # don't re-evaluate, case insensitive.

    So, as best I can tell, you're not only grabbing URLs that end in SWF, you're also grabbing URLs that are passing a query string (ie, URL?QUERY_STRING format)

Re: questions of a perl beginner on regex
by liverpole (Monsignor) on Jun 20, 2006 at 12:13 UTC
    Hi jithoosin,

    It depends on which "?" you mean.

    The typical use of "?" in a regex is "zero or one" of the preceding thing.Of course, another way to say "zero or one", is "optional".  So, for example, in "(?:\?.*)?", the final "?" means that "(?:\?.*)" is optional.

    The second "?" (in "\?") stands for a single "?" (since it's escaped with "\").

    However, the first "?":  :"(?:...)" has a different meaning, which is that it turns off the "capture" that would usually happen with "(...)".

    You should read more about perl regular expressions to get the full scoop.

Re: questions of a perl beginner on regex
by wazzuteke (Hermit) on Jun 20, 2006 at 12:23 UTC
    All the above comments have it pretty right on. The fun ? has a number of hidden meanings and can certainly be a little confusing. Overall, the ? can be very powerful when used in the right context, given any particular problem. May I recommend taking a look at the Perl regex documentation; it had proved to be a pretty good resource for questions just as these

    Have a good one!

    print map{chr}(45,45,104,97,124,124,116,97,45,45);
Re: questions of a perl beginner on regex
by derby (Abbot) on Jun 20, 2006 at 12:24 UTC

    Others have answered your question. Basically you have three different types of question marks: ?:, \? and ? - non capturing matching, escaping, and zero or more matches. But looking at that regex, I wonder if it's all just overkill. It looks like your matching a flash url that may have paramaters. I wonder if this simpler regex will do the job as well:

    Comments? I know some people wish death to dot-star but what's the downside here?


      Maybe, maybe not. I think it's overkill, but your proposal is too simple. It might cause false matches if there were a '' (or even '')

      I'd probably use something like:


      if we're not capturing, there's no reason to need '.*' to get the rest of the line, so a simple alternate (end of line, or a question mark), will work just fine.

      It's almost as few characters to use the following, and there would hopefully be less confusion about what it's doing for novices:


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://556354]
Approved by moklevat
Front-paged by moklevat
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2023-02-06 05:29 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (33 votes). Check out past polls.