Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This Regexp is not working.. why? I want to uppercase EVERYTHING inside tags...
#!/usr/bin/perl $x="<tag1> balhblahblahblah <tag2> asdfasdfasdfasd <tag3>"; $x=~s/<(.+)>/<\U$1\E>/; print $x;

Replies are listed 'Best First'.
Re: problem with a regexp
by Corion (Patriarch) on Oct 05, 2002 at 21:52 UTC

    The regular expression works, but not in the way you intended it to. It would have been helpful if you had given the output you got and the output you expected, together with an indication of the differences, but in this case, it's easy :

    # You get : <TAG1> BALHBLAHBLAHBLAH <TAG2> ASDFASDFASDFASD <TAG3> # You expect : <TAG1> balhblahblahblah <TAG2> asdfasdfasdfasd <TAG3>

    What is happening is, that the .+ part does not stop matching after meeting the first closing angle bracket but goes on matching further. I show the difference below :

    # You want :
    <tag1> balhblahblahblah <tag2> asdfasdfasdfasd <tag3>

    # You get :
    <tag1> balhblahblahblah <tag2> asdfasdfasdfasd <tag3>

    This behaviour of .* is called greedy. There is a lazy (non-greedy) version of .* that is written .*?, which does what you want :

    #!/usr/bin/perl $x="<tag1> balhblahblahblah <tag2> asdfasdfasdfasd <tag3>"; $x=~s/<(.+?)>/<\U$1\E>/; print $x;
    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: problem with a regexp
by husoft (Monk) on Oct 05, 2002 at 21:46 UTC
    remember to use /g and use .+? instead of .+:
    #!/usr/bin/perl -lw use strict; my $x="<tag1> balhblahblahblah <tag2> asdfasdfasdfasd <tag3>"; $x=~s/<(.+?)>/<\U$1\E>/g; print $x;
    Go and read more about regexp, click here!
Re: problem with a regexp
by chromatic (Archbishop) on Oct 05, 2002 at 22:51 UTC

    I prefer the character class [^>] to the everything-but-newline-usually dot.