harangzsolt33 has asked for the wisdom of the Perl Monks concerning the following question:

I am a beginner perl programmer. I am trying to write a function getArgsURL() that will extract the parameters from an URL string. I have samples $A, $B, and $C which I would like my function to work on. But every time I run this code, it throws me an error. And I have no clue how or why this error occurs. It says:

Quantifier follows nothing in regex; marked by <-- HERE in m/? <-- HERE / at C:\ BIN\PERL\Example.pl line 104.

Press any key to continue . . .

use strict; use warnings; # Samples my $A = "file:///c:/html/testing.html?P1=123&P2=%28%28BLAH+BLAH+BLAH%2 +9%29"; my $B = "http://www.cnn.org/g/ar.shtml?c=123055&s=%28Top+Stories+%29"; my $C = "http://www.something.com/example/article.php?P1=123&P2=%28%28 +DATA+GOES+HERE%29%29%0D%0A#PGTOP"; my @R = getArgsURL($A); foreach my $S (@R) { print "\nR: ", $S; } ############################################### # This function extracts arguments from an URL # string and returns them in pairs. # # Example: @R = getArgsURL("http://www.cnn.org/g/ar.shtml?c=123055&s=% +28Top+Stories+%29#PGTOP"); # R[0] ---> "c" # R[1] ---> "123055" # R[2] ---> "s" # R[3] ---> "(Top Stories)" # sub getArgsURL { my $S = shift; my @OUTPUT; my @X; my $P; splitAB($S, '?'); splitAB($b, '#'); @X = split('&', $a); foreach $S (@X) { $P = index($S, '='); if ($P < 0) { push(@OUTPUT, $S); } push(@OUTPUT, decodeURLstr(substr($S, 0, $P))); push(@OUTPUT, decodeURLstr(substr($S, $P+1))); } return @OUTPUT; } ################################################ # This function works like the split function, # however it will only split STRING into two # chunks at the first occurrence of PATTERN. # The section before the first occurrence of # PATTERN goes into $a, and rest goes into $b. # # (This function has no return value. # It simply changes the values of $a and $b.) # # Usage : splitAB(STRING, PATTERN) # sub splitAB { my $STRING = shift; my $PATTERN = shift; my @OUTPUT = split($PATTERN, $STRING, 2); $a = $b = ''; if (@OUTPUT > 0) { $a = $OUTPUT[0]; } if (@OUTPUT > 1) { $b = $OUTPUT[1]; } } ############################################ # This is the opposite of the escape() function. # sub unescape { my $XX; my $BYTE; my $INPUT = shift; my @OUTPUT; for (my $i = 0; $i < length($INPUT); $i++) { $BYTE = substr($INPUT, $i, 1); if (ord($BYTE) == 37) { $BYTE = ''; $XX = substr($INPUT, $i+1, 2); if (length($XX) == 2) { $i += 2; $BYTE = chr(hex($XX)); } } push(@OUTPUT, $BYTE); } return join("", @OUTPUT); } ################################################ # This function decodes an URL-style string. # Works like the unescape function, however # it will also convert '+' signs to spaces. # sub decodeURLstr { my $S = shift; # return unescape( join(' ', split('+', $S) ) ); }

What on earth is going on? It gives me an error when I call the split() function.

Replies are listed 'Best First'.
Re: RegExp error PLEASE HELP!
by afoken (Chancellor) on Jul 04, 2016 at 19:44 UTC

    WOULD YOU PLEASE STOP SHOUTING? WE ARE NOT YET DEAF!

    split expects a regular expression as its first parameter. + has a special meaning (quantifier) in regular expressions, you have to escape it.

    A nasty trap of split is that you can pass the RegExp as string. Better use // or qr() for that, so your intention becomes more clear. And to escape the +, either prefix it with a backslash or make it a one-character character class:

    my @array=split /\+/,$someString;

    or

    my @array=split /[+]/,$someString;

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: RegExp error PLEASE HELP!
by haukex (Archbishop) on Jul 04, 2016 at 19:56 UTC

    Hi harangzsolt33,

    As others have pointed out, the first argument to split is a regular expression. So to protect against special characters you could do something like split(quotemeta($PATTERN), $STRING, 2); (see quotemeta).

    However, you don't need to parse URLs by hand, there's the URI module:

    my @urls = ( "file:///c:/html/testing.html?P1=123&P2=%28%28BLAH+BLAH+BLAH%29%29 +", "http://www.cnn.org/g/ar.shtml?c=123055&s=%28Top+Stories+%29", "http://www.something.com/example/article.php?P1=123&P2=%28%28DATA ++GOES+HERE%29%29%0D%0A#PGTOP", ); use URI; for my $url (@urls) { my $u = URI->new($url); print "$u\n"; my @q = $u->query_form; print "\t\"$_\"\n" for @q; } __END__ file:///c:/html/testing.html?P1=123&P2=%28%28BLAH+BLAH+BLAH%29%29 "P1" "123" "P2" "((BLAH BLAH BLAH))" http://www.cnn.org/g/ar.shtml?c=123055&s=%28Top+Stories+%29 "c" "123055" "s" "(Top Stories )" http://www.something.com/example/article.php?P1=123&P2=%28%28DATA+GOES ++HERE%29%29%0D%0A#PGTOP "P1" "123" "P2" "((DATA GOES HERE)) "

    Hope this helps,
    -- Hauke D

      Oh, wow. Thanks!! I didn't know that.

      Ok, I wasn't aware that split() expects a regular expression... That's why. Ugh. :-P

      And sorry, I wasn't shouting. didn't mean to. LOL

      See, one of the reasons why I've come to like the perl language is because you can ask a question and get an answer INSTANTLY even on July 4th when everybody is resting and cooking BBQ outside. And the internet is full of perl documentation. There are thousands and millions and billions of pages of helpful stuff written about Perl. And it's available all for free. You don't get the same help with other languages. :-D

        you can ask a question and get an answer INSTANTLY even on July 4th when everybody is resting and cooking BBQ outside

        You seem to have a very limited view of the world. You know, earth has way more surface than just the tiny 2% of the USA, and there are very different cultures on the remaining surface. In many places of the world, July 4th is just an ordinary (work) day, perhaps too cold and wet for any kind of BBQ.

        And the internet is full of perl documentation. There are thousands and millions and billions of pages of helpful stuff written about Perl. And it's available all for free.

        Most of it, I guess, yes. But often, you get what you paid for. Crappy code written for Perl 4, or ancient versions of Perl 5, by people knowing neither Perl nor how to write reliable software.

        This is not limited to perl, you can find lots of crappy, ancient code for any mainstream language.

        You don't get the same help with other languages. :-D

        Yes, you do, with similar results. At least for mainstream languages. When it comes to really exotic stuff, like MUMPS or embedded languages for niche products (like medical or lab environments), things get harder, and answers get rarer.

        One thing about Perl that really stands out is CPAN. Almost everything you need for your Perl problem, in one place, and in the best case with a comprehensive set of tests for each package you want to install. Of course, not even CPAN is a paradise of perfect code, and you can find crap on CPAN, too. But try to find a library to access databases, to interface with a webserver, or to implement some obscure network protocol for a different language. You need a good search engine and a lot of luck, because people tend to host their libraries at some random place on the web. You may even have to pay for access to the libs, either with personal data or with money.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: RegExp error PLEASE HELP!
by Marshall (Canon) on Jul 04, 2016 at 20:20 UTC
    Instead of split on + and rejoin with space, you could use a regex for that. The split already has some regex overhead.
    use warnings; use strict; my @tests = ("this is+ +test", "this+test", "morepluses+++++++test"); for (@tests) { s/\++/ /g; #multiple+ to single space (see note) print "$_\n"; } __END__ this is test this test morepluses test or could use this regex to also compress spaces: s/[\s\+]+/ /g; #multiple spaces or + to single space this is test this test morepluses test
    Note: in /\++/ the first + is escaped to mean a literal + sign, the second plus is a "quantifier command" to the regex engine, meaning "one or more". Your split error arises because of this second meaning and there is nothing before the + to "quantify".

    Update: If you just want to convert each + to a space. tr will run much faster at that job because it builds a simple translation table. It does not use a regex, so the + should not be escaped:

    use warnings; use strict; my @tests = ("this is+ +test", "this+test", "morepluses+++++++test"); for (@tests) { tr /+/ /; #lower overhead than the regex engine print "$_\n"; } __END__ this is test this test morepluses test
Re: RegExp error PLEASE HELP!
by AnomalousMonk (Archbishop) on Jul 04, 2016 at 21:20 UTC

    Further to Marshall's post:   ... and if you want to use  tr/// to reduce multiple occurrences of a character to a single translated character, there's the  /s (squash) modifier (see  tr/// in Quote-Like Operators in perlop):

    c:\@Work\Perl\monks>perl -wMstrict -le "my $str = 'a+ +test+lotsapluses+++++++test'; print qq{'$str'}; ;; $str =~ tr{+}{ }s; print qq{'$str'}; " 'a+ +test+lotsapluses+++++++test' 'a test lotsapluses test'


    Give a man a fish:  <%-{-{-{-<

Re: RegExp error PLEASE HELP!
by Marshall (Canon) on Jul 04, 2016 at 21:57 UTC
    A few comments to your code. I downloaded it and got it run with these mods:
    sub splitAB { my $STRING = shift; my $PATTERN = shift; # when debugging "print is your friend" print "pattern in sub splitAB= $PATTERN\n"; ##### print "string in sub splitAB = $STRING\n"; ##### my @OUTPUT = split(/\Q$PATTERN\E/, $STRING, 2); ##### $a = $b = ''; if (@OUTPUT > 0) { $a = $OUTPUT[0]; } if (@OUTPUT > 1) { $b = $OUTPUT[1]; } } sub decodeURLstr { my $S = shift; return unescape( join(' ', split('\+', $S) ) ); ##### }
    In splitAB, the "?" has meaning to the regex engine, the \Q and \E tells split not to pay attention to that. output with my modifications:
    pattern in sub splitAB= ? string in sub splitAB = file:///c:/html/testing.html?P1=123&P2=%28%28B +LAH+BLAH+BLAH%29%29 pattern in sub splitAB= # string in sub splitAB = P1=123&P2=%28%28BLAH+BLAH+BLAH%29%29 R: P1 R: 123 R: P2 R: ((BLAH BLAH BLAH)) Process completed successfully
    I guess that is what you wanted?
    From looking at the code and the use of substr(), I am guessing that you have a C background? In Perl, the use of regex instead of substr is the norm. As you write more Perl, you will use regex more often. In addition, you will start using modules for common task like parsing a URI. You "rolled your own" without necessity. I can tell that a lot of work went into your code. But you made the job harder on yourself than need be. Pay attention to haukex's post.

    Have a happy and safe 4th of July!

    Update: Do not use $a or $b. These variables have special meaning to Perl in sort. Use of them will cause problems in code that sorts.

    Update 2: perhaps

    my @OUTPUT = split(/\Q$PATTERN\E/, $STRING, 2); ##### $a = $b = ''; if (@OUTPUT > 0) { $a = $OUTPUT[0]; } if (@OUTPUT > 1) { $b = $OUTPUT[1]; }
    better written? as (comments about use of $a,$b not withstanding):
    ($a, $b) = split(/\Q$PATTERN\E/, $STRING, 2); ##### $a //= ''; #null string if undefined $b //= ''; #null string if undefined
Re: RegExp error PLEASE HELP!
by $h4X4_&#124;=73}{ (Monk) on Jul 05, 2016 at 10:59 UTC

    Your error is the fact that you need to escape the characters for $PATTERN

    splitAB($S, '\?'); splitAB($b, '\#');
    But I also see you are using some reserved variables $a = $b = ''; your would be better off renaming those variables to something other than $a; $b; like my $data_a;

Re: RegExp error PLEASE HELP!
by Anonymous Monk on Jul 04, 2016 at 19:42 UTC