Maclir has asked for the wisdom of the Perl Monks concerning the following question:

I wish to remove a paramater and value pair from a CGI request string, which coudl be of the form:
. . . frame=some_name . . .
and there may be a trailing "&" to separate the next parm.
I used the following bit of code:
$print_link = $ENV{'REQUEST_URI'} ; $print_link =~ s/frame=[\w*]&*//;
to look for the string starting with "frame=", followed by any number of "word" characters, then optionally an ampersand.
Sadly, what I end up with is for an input containing
. . .&frame=content
I get
. . .&ontent
What have I done wrong?
Thanks

Replies are listed 'Best First'.
Re: Regexp problem
by chromatic (Archbishop) on Apr 18, 2000 at 08:23 UTC
    (edited on 18 April to correct what btrott notes below)

    Your regexp asks for:

    the literal string "frame=" followed by ONE character from a character class containing alphanumer +ics and the underscore AND an asterisk! followed by the ampersand repeated zero or one times
    and gets rid of them. What matches in your string is in bold: &frame=content. Can you see why?

    I think you want something like this: $print_link =~ s/frame=(\w*)&*?/$1/; This looks for your "frame=", saves all word characters, looks for 0 or more ampersands non-greedily, and substitutes just the saved alphanumerics for the whole thing.

    If you were to use CGI.pm -- and you probably should -- you could do: my $frame = $q->param("frame");

      chromatic wrote that the OP's regex matches:
      > the literal string "frame=" > followed by a character class containing alphanumerics and the under +score > repeated zero or more times > ...
      I don't think this is right, particularly the bit about the character class. Within a character class definition ("[]"), an asterick matches an asterick--it doesn't have a meta-meaning within a character class. So what that character class actually matches is alphanumerics, the underscore, and an asterick. And it matches it *one time*-- not zero, not more than 1.

      That's why just the "c" from "content" got included-- because the character class swipes up one character.

      As an example, take a look at this:

      my $print_link = "http://www.foo.com/bar?method=go&frame=*content&name +=baz"; $print_link =~ s/frame=[\w*]&*//; print $print_link, "\n";
      This prints out
      http://www.foo.com/bar?method=go&content&name=baz
      So the regex matched "frame=*".
        Whoops, you're right. Moral of the story, never leave a writeup half finished and come back thirty minutes later without reviewing things.

        I must have confused the character class braces with the grouping brackets, as the original poster did. Moral of THAT story, always ask yourself if you want curved brackets or square braces.

        Thank you. I made two mistakes, using square brackets when I should have use parenthesis, and thinking that a * within the square brackets would have the same effect as using a * outside the square brackets, in the normal part of the regexp.
        Needless to say, it now works as I originally intended. Maybe I need to read further into the details of regexps, and to determine when one should use square brackets or (). I was under the impression that you only grouped things with () if you wished to later refer to them as a substitution string (like $1).
        Thanks.