in reply to Search for text from user input

Perhaps try using the value of $search somewhere in your code? A second grep would be good.

Also, be sure to sanitize your input or it will run arbitrary code many times.

Replies are listed 'Best First'.
Re^2: Search for text from user input
by sierpinski (Chaplain) on May 12, 2010 at 19:28 UTC
    I would strongly recommend using -wT in the shebang line (the first line), w turns on warnings, and T does taint checking (as was just mentioned above). This forces you to run your input through a regex that checks for only valid characters (0-9, a-z, A-Z, _, -, etc) and not code that can be executed on your webserver.

    I also recommend 'use strict;' to force proper programming technique.

    Incorporating your $search in a grep or in your original regex would fix your problem. Right now you're not looking for what was typed, only things with 'jpg' in the title. You can also reduce your regex by using the 'i' operator after the last slash to make your search case-insensitive. I'd suggest you do some searching on google for regexs (you might find a good tutorial here, I haven't looked lately) and how they work.
Re^2: Search for text from user input
by Nathan_84 (Acolyte) on May 12, 2010 at 19:46 UTC

    How do i sanitize my input?

    #!/usr/bin/perl -w -T
    Ive tried using the <STDIN> and added it to grep but im unable to get it to work. Im not sure how im ment to add the input to grep.

    Thanks

      Update: This was intended as an answer to re ^2; specifically, how to untaint. Apologies for any confusion caused by my confusion. :-)
       

      Anonymonk gave you the bullet version; sierpinski provided the details. Very simply, write a regular expression to reject anything which is NOT acceptable -- for your purposes, acceptable input might well be constrained to

      /^[A-Za-z0-9]+\.jpg$/i

      ...that is, a name beginning with an upper or lowercase alpha character or a digit, followed by any number of alphas or digits, followed by a period and "jpg". The "^" and "$"mark the beginning and end of your $search string, thus preventing someone from sending you a file called

      foo.jpg.delete_everything.exe.

      Alternately, your could reject everything except the char set just discussed by using

      /^[^A-Za-z0-9]+\.jpg$/i

      ...which is the inverse set-- anything that is NOT an upper or lowercase alpha or digit matches, in which case you would want to reject anything that DOES match this one. (if you wish to accept "*.jpeg" you'll need to extend these regexen.)

      BTW, the shebang is better written as

      #!/usr/bin/perl -wT

      I suspect your version will fail. And, for your own sanity and safety:

      • use strict; use warnings;
      • ALWAYS untaint any user input that's coming from anyone other than you, yourself
      • Read (re-read?) Ovid's CGI course (Super Search will find a recent link for you if it's not currently listed in Tutorials) and perlretut
      • Use chomp rather than chop when you're trying to remove the newline from input
      • and, re line 13 in your re ^3, below, the single quotes around $search mean your're telling the regex to match the string comprised of a dollar-sign followed by the letters s,e,a,r,c,h. Read about interpolation: oversimplified, a variable which is in inside single quotes is treated as a literal; a var inside double quotes -- or, in this case, NOT INSIDE QUOTES AT ALL -- is interpolated (meaning, its content is used). See walkingthecow's regex -- but don't use that code without adding -wT, at which time you will have to include a routine (regex) to untaint the untrusted user input.

      And, as to your question in re ^3, consider: Where do you expect the value of $_ to come from? Again, see walkingthecow's answer, below.

        Warning about the above for the future reader:

        Both regexen were originally written without the /i. Then (belatedly) recognition of the need for case insensitivity on the file extension set in. /me (insufficient thought) just added /i... without fixing the rest of the regexen.

        Duh. That kind of thoughtlessness during late-night (or early morning) code revision has bit me before. Perhaps this will warn others.

        The regexen don't need the a-z when the i is added. Obvious? Yes, but not at the (befuddled) time.

        And the second version would be better and more clearly written with a negated match:  if ($item !~/^[A-Z0-9]+\.jpg$/i) {

        Tested:

        #!/usr/bin/perl use strict; use warnings; # 839761 my @foo=("a.jpg", "b.txt", "Cde.jpg", "abc.jpg.delete_everything.exe", + "123.jpg", "|#&.exe"); print "Using negative match, '!~'- items which match should be exclude +d:\n\n"; for my $item(@foo) { if ($item !~/^[A-Z0-9]+\.jpg$/i) { print "\t--> (negated) match: $item \n"; }else{ print "no match in \$item: $item \n"; } } print "\n\n". 'Now using /^[A-Z0-9]+\..+$/i' . " so matched should be +accepted: \n\n"; for my $item(@foo) { if ($item =~/^[A-Z0-9]+\.jpg$/i) { print "\t --> match: $item \n"; }else{ print "no match in \$item: $item \n"; } }

      This is my attempt however is doesnt work and i get the error message:

      Use of uninitialized value $_ in pattern match (m//) at search2.pl line 11, <STDIN> line 1.

      Any ideas?

      #!/usr/bin/perl -w -T print "Please enter text for string search? \n"; $search=<STDIN>; chop $search; opendir(DIR, "."); @files = grep (/\.jpg|\.jpeg|\.JPEG|\.JPG/, readdir(DIR)) and ($_ =~ m/\.'$search'/, readdir(DIR)); closedir(DIR); foreach $file (@files) { print "$file\n"; }