Re: Search for text from user input

Replies are listed 'Best First'.
Re^2: Search for text from user input by sierpinski (Chaplain) on May 12, 2010 at 19:28 UTC
I would strongly recommend using -wT in the shebang line (the first line), w turns on warnings, and T does taint checking (as was just mentioned above). This forces you to run your input through a regex that checks for only valid characters (0-9, a-z, A-Z, _, -, etc) and not code that can be executed on your webserver. I also recommend 'use strict;' to force proper programming technique. Incorporating your $search in a grep or in your original regex would fix your problem. Right now you're not looking for what was typed, only things with 'jpg' in the title. You can also reduce your regex by using the 'i' operator after the last slash to make your search case-insensitive. I'd suggest you do some searching on google for regexs (you might find a good tutorial here, I haven't looked lately) and how they work.	[reply]
Re^2: Search for text from user input by Nathan_84 (Acolyte) on May 12, 2010 at 19:46 UTC
How do i sanitize my input? `#!/usr/bin/perl -w -T` [download] Ive tried using the <STDIN> and added it to grep but im unable to get it to work. Im not sure how im ment to add the input to grep. Thanks	[reply] [d/l]
Re^3: Search for text from user input by ww (Archbishop) on May 13, 2010 at 02:30 UTC
Update: This was intended as an answer to re ^2; specifically, how to untaint. Apologies for any confusion caused by my confusion. :-) Anonymonk gave you the bullet version; sierpinski provided the details. Very simply, write a regular expression to reject anything which is NOT acceptable -- for your purposes, acceptable input might well be constrained to `/^[A-Za-z0-9]+\.jpg$/i` ...that is, a name beginning with an upper or lowercase alpha character or a digit, followed by any number of alphas or digits, followed by a period and "jpg". The "^" and "$"mark the beginning and end of your $search string, thus preventing someone from sending you a file called `foo.jpg.delete_everything.exe`. Alternately, your could reject everything except the char set just discussed by using `/^[^A-Za-z0-9]+\.jpg$/i` ...which is the inverse set-- anything that is NOT an upper or lowercase alpha or digit matches, in which case you would want to reject anything that DOES match this one. (if you wish to accept ".jpeg" you'll need to extend these regexen.) BTW, the shebang is better written as `#!/usr/bin/perl -wT` I suspect your version will fail. And, for your own sanity and safety: `use strict; use warnings;` ALWAYS* untaint any user input that's coming from anyone other than you, yourself Read (re-read?) Ovid's CGI course (Super Search will find a recent link for you if it's not currently listed in Tutorials) and perlretut Use `chomp` rather than chop when you're trying to remove the newline from input and, re line 13 in your re ^3, below, the single quotes around `$search` mean your're telling the regex to match the string comprised of a dollar-sign followed by the letters s,e,a,r,c,h. Read about interpolation: oversimplified, a variable which is in inside single quotes is treated as a literal; a var inside double quotes -- or, in this case, NOT INSIDE QUOTES AT ALL -- is interpolated (meaning, its content is used). See walkingthecow's regex -- but don't use that code without adding -wT, at which time you will have to include a routine (regex) to untaint the untrusted user input. And, as to your question in re ^3, consider: Where do you expect the value of `$_` to come from? Again, see walkingthecow's answer, below.	[reply] [d/l] [select]
Re^4: Search for text from user input by ww (Archbishop) on May 13, 2010 at 13:56 UTC
Warning about the above for the future reader: Both regexen were originally written without the `/i`. Then (belatedly) recognition of the need for case insensitivity on the file extension set in. /me (insufficient thought) just added /i... without fixing the rest of the regexen. Duh. That kind of thoughtlessness during late-night (or early morning) code revision has bit me before. Perhaps this will warn others. The regexen don't need the `a-z` when the `i` is added. Obvious? Yes, but not at the (befuddled) time. And the second version would be better and more clearly written with a negated match: `if ($item !~/^[A-Z0-9]+\.jpg$/i) {` Tested: #!/usr/bin/perl use strict; use warnings; # 839761 my @foo=("a.jpg", "b.txt", "Cde.jpg", "abc.jpg.delete_everything.exe", + "123.jpg", "\|#&.exe"); print "Using negative match, '!~'- items which match should be exclude +d:\n\n"; for my $item(@foo) { if ($item !~/^[A-Z0-9]+\.jpg$/i) { print "\t--> (negated) match: $item \n"; }else{ print "no match in \$item: $item \n"; } } print "\n\n". 'Now using /^[A-Z0-9]+\..+$/i' . " so matched should be +accepted: \n\n"; for my $item(@foo) { if ($item =~/^[A-Z0-9]+\.jpg$/i) { print "\t --> match: $item \n"; }else{ print "no match in \$item: $item \n"; } } [download]	[reply] [d/l] [select]
Re^3: Search for text from user input by Nathan_84 (Acolyte) on May 12, 2010 at 21:23 UTC
This is my attempt however is doesnt work and i get the error message: Use of uninitialized value $_ in pattern match (m//) at search2.pl line 11, <STDIN> line 1. Any ideas? `#!/usr/bin/perl -w -T print "Please enter text for string search? \n"; $search=<STDIN>; chop $search; opendir(DIR, "."); @files = grep (/\.jpg\|\.jpeg\|\.JPEG\|\.JPG/, readdir(DIR)) and ($_ =~ m/\.'$search'/, readdir(DIR)); closedir(DIR); foreach $file (@files) { print "$file\n"; }` [download]	[reply] [d/l]