Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Correct Regex for reading stock symbol?

by Anonymous Monk
on Jan 31, 2006 at 17:28 UTC ( #526810=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:


I am trying to have my script taint a incoming stock symbol that is entered through a html form. Most stock symbols work and the page ends up loading fine when parsing the form data inputted.

However this one symbol seems to stop my script from loading up normal. It just stalls and stalls until it finally gives me a timeout error. Also, when using this symbol, it seems that my CPU usage goes up to 100%! ITs as if its causes a buffer overflow in the script.

The stock symbol I'm talking about is AIDO.OB

For all other stock symbols , my script look fine, but when I type in "AIDO.OB", it gives me this constant problem.

Heres the Regex I'm using to taint:
$stock_symbol = $INPUT->param('stock_symbol'); if ($stock_symbol =~ /^([-\@\w.]+)$/ && length($stock_symbol) < 11 + && $stock_symbol ne "") { $stock_symbol = $1; } else { print "Invalid Symbol!\n"; exit;}
Any ideas why this is doing it? I'm using ActiveState Perl btw on a WinXP server with IIS 5.0.


Replies are listed 'Best First'.
Re: Correct Regex for reading stock symbol?
by ikegami (Patriarch) on Jan 31, 2006 at 18:02 UTC

    I don't know what the problem is, but I can suggest some ways of cleaning up your code.

    • && $stock_symbol ne "" is useless since your regexp will never match an empty string.

    • Also, $stock_symbol = $1; is useless, since you're capturing the entirety of the regexp.

    • Finally, length($stock_symbol) < 11 can be removed by replacing the + in the regexp with {1,11}.

    What follows is what you get when you apply the above cleanups:

    $stock_symbol = $INPUT->param('stock_symbol'); if (!$stock_symbol =~ /^[-\@\w.]{1,11}\z/) { print "Invalid Symbol!\n"; exit; }

    Update: Oops, $ can match before a trailing newline. Changed $ to \z to fix.

      Also, $stock_symbol = $1; is useless, since you're capturing the entirety of the regexp.
      Wrong. This is one way to untaint $stock_symbol. You can read more about taint checks in perlsec.
        Ah yes. The OP even mentioned tainting. Taint is something I need to use more. The OP really shouldn't be using the same variable name for both the tainted and the untainted variables. Joel wrote an excellent paper on the subject. Admittedly, Joel's language didn't have built in tainting, but I recommend the technique nonetheless.
      Thanks for all the help. I realized that It may not be that portion of the code thats causing a CGI Timeout. I'm pretty sure its because of the "O.O" part in the symbol "AIDO.OB" thats causing a breakage somewhere in my script. ALl other symbols work fine, and if I try adding a "." anywhere inside a symbol, it will stall. So for ex. IF I tried using "BLAH.SYMB" then it will still stall.

      I guess I will try breaking down my script more and try to find the problem.


        What are you doing with the 'untainted' data?


Re: Correct Regex for reading stock symbol?
by davido (Cardinal) on Jan 31, 2006 at 17:38 UTC

    The following snippet does not replicate the behavior you're describing:

    use strict; use warnings; use CGI; my $INPUT = CGI->new(); my $stock_symbol = $INPUT->param('stock_symbol'); if ($stock_symbol =~ /^([-\@\w.]+)$/ && length($stock_symbol) < 11 + && $stock_symbol ne "") { $stock_symbol = $1; print $1, "\n"; } else { print "Invalid Symbol!\n"; exit;}

    I tested it from the command line like this:

    perl "stock_symbol=AIDO.OB"

    It's nice that allows you to test from the command line like that; it helps in tracking down bugs.

    Anyway, it appears that the source of your trouble is not contained within the snippet you showed us. Back to the drawing board. Try lacing your script with logging notices so that you can see where it's hanging up.


Re: Correct Regex for reading stock symbol?
by kwaping (Priest) on Jan 31, 2006 at 17:39 UTC
    Removed: I learn something new every day on this site! Thanks to samtregar for today's lesson.

    I thought it had something to do with the unescaped period inside the brackets in the regex, but samtregar proved me wrong.
      Not true. Observe:

      $ perl -e 'print "broken\n" if "foo" =~ /[.]/' $ perl -e 'print "ok\n" if "foo." =~ /[.]/' ok

      The period is not magical inside a character class.


      Removed: I learn something new every day on this site!

      Unfortunately, nobody else gets to, because you removed the node instead of updating it. Please consider that next time, so others can get the full context of the thread and benefit in the mutual learning opportunity.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://526810]
Approved by Paladin
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2023-11-29 18:37 GMT
Find Nodes?
    Voting Booth?

    No recent polls found