in reply to Regular expression to match text between two tags (was: Help!! Regular Expressions)

As you already read MRE, Death to Dot Star! will be clear to you, if it's not, read it and then go back to MRE :-).

The Owl book (Mastering Regular Expressions) has some quite good hints at this, and it also demonstrates an interesting technique on how to avoid .* or the lazy version .*?. Here's what I came up with after reading through my copy of MRE. Note that my solutions differ from physis solution in that they take the shortest possible match, while physis solution always takes the longest possible text between :::.

#!/usr/bin/perl -w use strict; # Slurp the input into $data my $data; { local $/; $data = <DATA>; }; # This is the naive way, using the non-greedy .*? if ($data =~ /:::(.*?):::/ms) { print "$1\n"; } else { print "No match.\n"; }; # This is the "perfect" way (should be described in the Owl book # somewhere). It's much more specific about what it wants, and # thus longer and more complex :-) if ($data =~ /::: # start ([^:]* # As many non-: as we can gobble (?: ::?[^:]+ # and then one or two :'s as long a +s they are )* # followed by something non-: ) ::: # end /msx # And we want to match spanning lin +es # and use eXtended re syntax ) { print "$1\n"; } else { print "No match.\n"; }; __DATA__ Some foo:::This is a some text. Today you watered the dog and ::test: bathed the plants. The server asked you what permission you had to tell it what to do on it's day off. This was your day.::: More foo.

Replies are listed 'Best First'.
Re: Re: Help!! Regular Expressions
by suaveant (Parson) on Apr 12, 2001 at 16:25 UTC
    Small point, but since you don't use . in your RE, you don't actually need the s modifier. You also don't match end or beginning of string, so you don't need the m modifier. Not that it matters much.
                    - Ant