Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

greedy search

by palette (Scribe)
on Jun 03, 2005 at 05:17 UTC ( [id://463104]=perlquestion: print w/replies, xml ) Need Help??

palette has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I have seen a lots of time in perl regular expressions about greedy search or matching. I have also looked into tutorials in webpages. I have used it for simple applications. So please mention the advantages of greedy search and i think it will be really very useful for text manipulations. I want to know the maximum utilisation of this type of match. So please provide explanation on this with examples.

Thanks

Replies are listed 'Best First'.
Re: greedy search
by ikegami (Patriarch) on Jun 03, 2005 at 05:33 UTC
    $str = 'aaaaa'; $str =~ s/a+/b/;

    Since '+' is greedy, you end up with 'b' in $str.
    If '+' wasn't greedy, you'd end up with 'baaaa' in $str.

    You can make '+' non-greedy by following it with a '?':

    $str1 = $str2 = '111b222c333c444'; $str1 =~ s/b.+c//; # $str1 contains '111444'. $str2 =~ s/b.+?c//; # $str2 contains ''111333c444'.

    '*' and '?' can similarly be modified with a '?'.

    Both greedy and non-greedy are useful. Use whichever one you need in a particular situation.

      Searches are greedy by default. But one situation where you want a non-greedy search that is fairly common is finding the contents of an HTML tag.

      Let's say you have:

      <a href="?"><b>Hello</b></a>

      If you pattern match using /<(.+)>/ your match will contain "a href="?"><b>Hello</b></a".

      If you pattern match using /<(.+?)>/ your match will contain "a href="?"", which is usually what people are looking for.

      Having the option is great, more power in your hands to find what you want!

Re: greedy search
by tlm (Prior) on Jun 03, 2005 at 13:16 UTC

    If you think about it, there are only two reasonable choices for this: minimalist and maximalist (aka greedy) matching. In Perl (and every other regex syntax I know of) the default is to make the matching greedy; furthermore, Perl gives you the option to make the matching minimalist if that's what you want. As far as I can tell, this choice is arbitrary; it could have gone the other way.

    But I'm just guessing here, and I could easily be wrong (i.e. there may be some fundamental reason for which maximalist matching is a more reasonable default than minimalist matching); if so, I look forward to being corrected.

    the lowliest monk

      There's often no difference between minimalistic and greedy matching, because of anchoring:

      /a*b/
      is the same as
      /a*?b/

      When there is a difference, minimalistic is usually just a shortcut for negative matching:

      '<a href="">text</a>' =~ /<((?!>).)*>/
      can be written as
      '<a href="">text</a>' =~ /<[^>]*>/
      can be written as
      '<a href="">text</a>' =~ /<.*?>/

      '{begin} foo {cmd} bar {end} {begin} baz {end}' =~ /{begin}((?!{end}).)*{end}/
      can be written as
      '{begin} foo {cmd} bar {end} {begin} baz {end}' =~ /{begin}.*?{end}/</code>

      While minimalistic matching can replace negative matching, it's not the same. However, I can't think of case where minimalistic matching is required. Since it's optional, many regexp engine do not provide it, making greedy the default (and only) option. (As an aside, most regexp engine usually don't provide (?!...) either, so they have a limitation.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://463104]
Approved by perlsen
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-29 12:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found