Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Regular Expressions

by Adrade (Pilgrim)
on Jun 20, 2005 at 04:32 UTC ( [id://468223]=note: print w/replies, xml ) Need Help??


in reply to Regular Expressions

If I understand you correctly, you want to match a body tag whether or not its comes with modifiers.

You want to strip the \s from within your regex, because with it there, this won't match '<body>', but will match '<body >' and '<body one=fish two=fish>'. As some folks mentioned, you need the s before your first delimiter (/) to indicate that you're substituting one thing for another, and the s after your last delimiter to indicate that your match should be viewed as a single-line (and shouldn't stop at a newline). I think some folks responding forgot to remove the \s from within their regexs - or I got the question wrong. The i, of course, indicates case insensitivity. What we end up with is:
    s/(<body.*?>)/$1$OtherStuff/si;
I would prefer, for the sake of style, to use the following instead - indicating that the characters before the > should be anything except >.
    s/(<body[^>]*>)/$1$OtherStuff/si;
I also like "pushing" stuff around, so to indicate that the first instance of body should be matched, though as far as I know, it won't make a difference.
    s/(<body[^>]*>)(.*)$/$1$OtherStuff$2/si;
Hope it helps!
  -Adam

--
Impossible! The Remonster can only be killed by stabbing him in the heart with the ancient bone saber of Zumakalis!

Replies are listed 'Best First'.
Re^2: Regular Expressions
by dyer85 (Acolyte) on Jun 20, 2005 at 08:49 UTC

    You are using a greedy regex there, and s/<body[^>]*>/$1$blah/gi; will probably grab more than you want.

    s/<body[^>]*?>/$1$blah/si;

    This would be sufficient, and with the s modifier, it will account for <body> spanning multiple lines. I didn't use g, as I doubt you want to match <body> on a global scale.

    Peace

      I don't understand why this would grab more than what is wanted. It seems to me that the [^>]* will grab everything that isn't a '>', so it'll grab stuff until we finally get to the first instance of '>'. Even though it is greedy, I don't think it would grab past the '>' of the '<body ... >' tag. Care to expand?

      $ perl -e 'my $str = "<body something=\"yep\"><a href=\"..\">"; $str = +~ s/<body[^>]*>/<body>/; print "$str\n"' <body><a href=".."> $

          -Bryan

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://468223]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-29 11:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found