clone4 has asked for the wisdom of the Perl Monks concerning the following question:

hello monks I'm working on a subroutine, which is supplied with a url, which will always include at least one get variable, like http://google.com/?search=ahoy, which isn't a problem. However when there are multiple get variables in the url, I need to store all of them, without their values, just the variable names. So far I've got:
sub check { my $url = "http://google.com/?var1=aaaa&var2=aa"; my $count = 0; my @arr; return undef if $url !~ /^http:\/\/.*?\?.*?=.*?$/i; $count++ while $url =~ /&/g; $url =~ s/^(.*?=)(.*?)$/$1/ and return $url if $count<1; $arr[0..$count] = $url =~ /^http:\/\/.*?\?(?:([\w\d]*=)(?:.*[&]?))*/gi +; say $_ foreach @arr; }
I need to store "var1=" and "var2=" into the array, however extending it to any other possible variables (if there was var3, var4 etc.)
And I am having major problem with the last regex, I've tried many variations, and this one as well throws a warning. So basically any ideas, how to achieve the task described above would be greatly appreaciated. Also regular expressions aren't exactly familiar to me that much so I'm aiming to improve my regex as much as possible, so I'd greatful for any criticims on the other regex included... Thanks

Replies are listed 'Best First'.
Re: url get variables regex
by Your Mother (Archbishop) on May 13, 2009 at 23:37 UTC

    Do it with URI and URI::QueryParam and you'll have code that works much more reliably and is easier to read and extend. Here's a snippet to get you going-

    use URI; use URI::QueryParam; while ( <DATA> ) { chomp; my $uri = URI->new($_); next unless $uri->scheme eq 'http'; next unless $uri->query_param; print $uri, $/; for my $param ( $uri->query_param ) { printf(" %s --> %s\n", $param, join(", ", $uri->query_param($param)) ); } } __DATA__ http://google.com/?var1=aaaa&var2=aa https://google.com/?var1=aaaa&var2=aa http://google.com/ ftp://google.com/ mailto:moo@gmail.com http://google.com/?var=one&var=two&var=three
      Thanks for the snippet, I will definitely have a look at that!
Re: url get variables regex
by Marshall (Canon) on May 13, 2009 at 23:34 UTC
    I'm not sure what you are trying to do with $count?
    $arr[0..$count] = $url =~ /(^http:\/\/.*?)\?(?:([\w\d]*=)(?:.*[&]{1,}) +)*/gi;
    @arr = ($url =~ /(^http:\/\/.*?)\?(?:([\w\d]*=)(?:.*[&]{1,}))*/gi);
    looks more like I think you want although I haven't reviewed your regex in detail. When you put the regex match in a list context, @arr will get all the matches from the /g global match option.
      Yes you are right, that was one of the reasons why I was getting error, just a bad codind practise from long time ago:) Thanks for that as well
Re: url get variables regex
by JavaFan (Canon) on May 13, 2009 at 23:24 UTC
    You cannot capture a variable amount of captures from a regexp. You can not capture more than there are paren pairs in the regexp (in 5.10, you can capture less).

    What's wrong with one of the several URI or CGI modules out there that already do this task? Why try to do it all in a single regexp?

      I just realized that, so I changed the regex to:
      $arr[0..$count] = $url =~ /^http:\/\/.*?\?(?:([\w\d]*=)(?:.*[&]?))*/gi +;
      (of course still not working properly:))
      And to answer your question, I'm fairly firmly fixed just to mechanize module, and that only provides links method, which won't recognize the separate request variables... It is just matter of perfecting the line above:)