Sullust has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to find a module of some sort or another which will allow me to check variables to see if they contain JavaScript. I don't know JavaScript myself so I can't just whip up the RegEx to get it done. Anyone know where one is or has a RegEx already written that will do the job?
Thanks.

Replies are listed 'Best First'.
Re: Checking forms for JavaScript
by cLive ;-) (Prior) on Jul 24, 2001 at 02:57 UTC
    If Cubes is right on what you're doing, I suggest you strip out all <script...> tags - whatever the scripting language. Here's a regex for this:
    $text =~ s/<SCRIPT[^>]> # the opening <SCRIPT...> tag .*? # as few chars as possible until... (?:</SCRIPT>|$) # closing script tag or the end of the tex +t ($) //xgis; # (x) comments (g)lobal, case (i)nsensitiv +e, # treat string as (s)ingle line # uncommented version $text =~ s/<SCRIPT[^>]>.*?(?:</SCRIPT>|$)//gis;

    You need to check for strings that don't contain a closing tag, as this example below shows:

    # imagine the following are two consecutive posts on a bulletin board $first_post = <<_END_; <SCRIPT LANGUAGE="Javascript"> document.location.href = "http://www.hamsterdance.com" /* _END_ $second_post = <<_END_; */ </SCRIPT> _END_

    I actually used this once :)

    HTH

    cLive ;-)

      You should also look for onLoad and other attributes inside IMG tags, etc.

      -Lee

      "To be civilized is to deny one's nature."
        Valid point. But I think other event handlers also need covering - if we're gonna one, we'd better cover *anything* that can trigger code.

        And I guess you should strip all links that start "javascript:" - arghhhh.

        So I guess we'd need to add something like:

        # javascript: $text =~ s/(["'])\s*javascript:.*?\1/"'/gis; # event handlers (on + 4 chars is min length) $test =~ s/\bon\w{4,}\s*=\s*(['"]).*?\1//gis;

        Untested, but I think that might do the trick...

        Have I missed anything?

        cLive ;-)

Re: Checking forms for JavaScript
by Cubes (Pilgrim) on Jul 24, 2001 at 02:13 UTC
    I'm not sure if there's any way to pick javascript code out of some random block of text without doing some serious linguistic analysis.

    What's your objective here? I'm guessing you have some stuff from a web form that's going to go back onto a web page, and you don't want to allow any potentially nasty javascript. Based on that assumption...

    You could take the easy way out and just strip anything that looks like an HTML tag. Or you could strip out <script> tags, the string 'javascript:', and any html tag attributes that have to do with javascript (onClick, onMouseOver, etc.).

    Not being a javascript wiz, I'm sure I haven't covered everything, but this ought to give you a start. You might get more useful answers if you can post more details of what you're trying to accomplish with this code.

    Update: corrected para 3. Of course I meant strip out <script> tags

      That's exactly what I'm trying to do. I just don't want the nefarious citizens of the net to be sticking unpleasant scripts in my database ;) I appologize for not being more specific. Anyway, those suggestions look very promissing. I thank you for your help. I'm not so much interested in completely removing scripts, as making them inoperable and having something warn me about a potentially script laden post. I suppose I should have been more specific about that as well. Thanks again for the help.
Re: Checking forms for JavaScript
by Anonymous Monk on Jul 24, 2001 at 16:30 UTC
    Forget the regexes. use a html parsing module, like HTML::Parser or HTML::TokeParser and parse the user input. this is more sturdy and less magical than any regex, and it allows you to strip out any html, and not just certain attributes or tags.