malaga has asked for the wisdom of the Perl Monks concerning the following question:

i am doing a search on txt files. so the files that include the search word have to be returned. but what i want to print is the date of the submission and the title of the submission, 2 fields within each file. the link has to be to the file. here is the code:
#!/usr/bin/perl # Define Variables # $basedir = '/htdocs/'; $baseurl = 'http://website.com'; @files = ('*.dtl'); $title = "Search Again"; $title_url = 'http://website.com/'; $search_url = 'http://website.com/'; # Parse Form Search Information &parse_form; # Get Files To Search Through &get_files; # Search the files &search; # Print Results of Search &return_html; sub parse_form { # Get the input read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); # Split the name-value pairs @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $FORM{$name} = $value; } } sub get_files { chdir($basedir); foreach $file (@files) { $ls = `ls $file`; @ls = split(/\s+/,$ls); foreach $temp_file (@ls) { if (-d $file) { $filename = "$file$temp_file"; if (-T $filename) { push(@FILES,$filename); } } elsif (-T $temp_file) { push(@FILES,$temp_file); } } } } sub search { @terms = split(/\s+/, $FORM{'terms'}); foreach $FILE (@FILES) { open(FILE,"$FILE"); @LINES = <FILE>; close(FILE); #$string is what we will search $string = join(' ', @LINES); $string =~ s/\n//g; #$string2 is the title of the document @string2 = ($LINES[0]); $string2 =~ s/\n//g; $string2 =~ s/\<h3\>//ig; $string2 =~ s/\<\/h3\>//ig; $string2 =~ s/title: //; #$string3 is the submission date of the document @$string3 = ($LINES[15]); $string3 =~ s/\n//g; $string3 =~ s/submitted: //; #chomp ($string3); $string3 = scalar localtime $string3; if ($FORM{'boolean'} eq 'AND') { foreach $term (@terms) { if ($FORM{'case'} eq 'Insensitive') { if (!($string =~ /$term/i)) { $include{$FILE} = 'no'; last; } else { $include{$FILE} = 'yes'; } } elsif ($FORM{'case'} eq 'Sensitive') { if (!($string =~ /$term/)) { $include{$FILE} = 'no'; last; } else { $include{$FILE} = 'yes'; } } } } elsif ($FORM{'boolean'} eq 'OR') { foreach $term (@terms) { if ($FORM{'case'} eq 'Insensitive') { if ($string =~ /$term/i) { $include{$FILE} = 'yes'; last; } else { $include{$FILE} = 'no'; } } elsif ($FORM{'case'} eq 'Sensitive') { if ($string =~ /$term/) { $include{$FILE} = 'yes'; last; } else { $include{$FILE} = 'no'; } } } } if ($string =~ /<title>(.*)<\/title>/i) { $titles{$FILE} = "$1"; } else { $titles{$FILE} = "$string2"; $titles2{$FILE} = "$string3; } } } sub return_html { print "Content-type: text/html\n\n"; print "<html>\n <head>\n <title>Results of Search</title>\n </head +>\n"; print "<body>\n <center>\n <h1>Results of Search in $title</h1>\n +</center>\n"; print "<b>Below are the results of your search sorted by submission + date:</b><p><hr size=7 width=75%><p>\n"; print "<ul>\n"; foreach $key (keys %include) { if ($include{$key} eq 'yes') { print "<li><a href=\"passtest.cgi?$key\">$titles{key}<br>$titles +2{$key}<br></a><br><br>\n"; } } print "</ul>\n"; print "<hr size=7 width=75%>\n"; print "Search Information:<p>\n"; print "<ul>\n"; print "<li><b>Terms:</b> "; $i = 0; foreach $term (@terms) { print "$term"; $i++; if (!($i == @terms)) { print ", "; } } print "\n"; print "<li><b>Boolean Used:</b> $FORM{'boolean'}\n"; print "<li><b>Case $FORM{'case'}</b>\n"; print "</ul><br><hr size=7 width=75%><P>\n"; print "<ul>\n<li><a href=\"$search_url\">Back to Search Page</a>\n" +; print "<li><a href=\"$title_url\">$title</a>\n"; print "</ul>\n"; print "<hr size=7 width=75%>\n"; print "</body>\n</html>\n"; }

Replies are listed 'Best First'.
Re: malaga's hash/array/search problem
by Petruchio (Vicar) on Jan 18, 2001 at 13:11 UTC
    Well, let me give a few suggestions which do not bear directly upon your problem (the exact nature of which remains fuzzy to me).

    First, after #/usr/bin/perl append -wT. The -w flag will give you helpful warnings, and the -T flag will help keep people from doing bad things with your script.

    Second, place the following line near the beginning of your program, right around line 2:

    use strict;

    Then go around and correct all the errors which pop up by using the my keyword. As this program stands, there's practically no reason to use subroutines at all. Since, from what I gathered in the CB, you're up against a very near deadline, maybe you could postpone this... but do get around to it ASAP. It's important. Besides, it will teach you a lot.

    Third, right around line three, insert the following line:

    use CGI qw/:standard/;

    Shortly thereafter, insert the following line:

    my %FORM = map {$_ => param($_)} param();

    Now you may eliminate the parse_form subroutine, since the form has been parsed, and the parameters stored in %FORM. The moral of this story is, use the CGI module. It's better and safer than doing this sort of thing yourself (as many monks will tell you in much stronger terms).

    The CGI module also allows you, in your return_html subroutine, to tidy things up considerably. Instead of saying:

    print "Content-type: text/html\n\n"; print "<html>\n <head>\n <title>Results of Search</title>\n </head>\n +"; print "<body>\n <center>\n <h1>Results of Search in $title</h1>\n</ce +nter>\n";

    You can now say:

    print header, start_html('Results of Search'), h1({-align=>'center'},"Results of Search in $title");
    Next, @files contains but a single string, which means it's not much of an array. And then you only use it once, in a for loop. Now if you only own one dog, and you leave it with your friend, you can tell him, "make sure you feed each of my dogs". But you probably would rather say, "make sure you feed my dog". Hence, change the line:

    @files = ('*.dtl');

    to:

    $file = '*.dtl';

    and then delete the line:

    foreach $file (@files) {</code>

    as well as the last } in the get_files subroutine.

    Oh, yeah... and what's with that line:

    @$string3 = ($LINES[15]);

    ? I'm guessing that's a typo, and that the @ doesn't belong there.

    There's a bunch of other stuff, but I'm tired, and anyway, I don't want to be (more) pedantic. Really, as dws points out, this is a pretty good first effort. I can see why you're lost and frustrated, though. Your code is more complex than it needs to be. If you're able to simplify things, you'll see the logic behind what you're doing much more clearly.

    Also, implement these suggestions one at a time, and test as you go along. I have not tested them; the code I've written is off the top of my head, and my head is sleepy. Never trust code (at least) until you see it work. :-)

    Good luck.

Re: malaga's hash/array/search problem
by dws (Chancellor) on Jan 18, 2001 at 11:20 UTC
    As a first start, that's not too bad. There are several next steps you should take right away:
    1. Turn on warnings with -w
    2. use strict; and fix the problems it reports.
    3. Unbuffer STDOUT via $|++, and
    4. Move the line that prints "Content-type" to the top. (For a good explanation of this and the ones above, see nearly any of merlyn's Web Techniques articles.)

    Next, you have a couple of opportunities for improvement.

    1. Use CGI.pm to handle form processing correctly. Form processing is trickier to get right than many book (and code fragments) would have you believe. (Search the Monastery; examples abound.)
    2. Use opendir/readdir/closedir to read the directory from within Perl, rather than forking off ls.
    3. Read up on Perl's join function, and look for a place to place in your code to turn 7 lines into 1.

    Then there are some things you can do for extra-credit

    1. Use CGI.pm routines for generating HTML. (Search the Monastery; examples abound.)
    2. look in the docs for what the /o modifier does for regexps

    These steps will get your code into good-enough shape that when next you post it, people will be more likely to respond with feedback on your code's correctness.

      Also, use taint mode:
      #!/usr/bin/perl -wT
      Taint mode may help your script be more secure by drawing possibly insecure variables to your attention.
        Thank you. I appreciate the help. It took me a while to find this node and the posted replies. I'm still not sure how to find things on this site.
      Thank you. I appreciate the help. It took me a while to find this node and the posted replies. I'm still not sure how to find things on this site.
Re: malaga's hash/array/search problem
by a (Friar) on Jan 19, 2001 at 10:55 UTC
    Sorry if we've left you hanging. The above posts are, of course, spot on - you've lots to tidy up if you want to get -w/use strict to work and that'll help. But the sorting problem: Well, seems like you could do:
    $title = $1 if $LINES[0] =~ /title: (.*)$/i; # though $[ is better than zero $submitted_date = $1 if $LINES[15] =~ /submitted: (.*)$/i; # But how do you know its line 16?
    but to get a sortable hash you could:
    # use push on the off chance you've got multiple files w/ # the same sub date push @{$dates{$submitted_date}}, $FILE;
    the trick here is; in the dates hash, use the submitted_date/string3 key but make it ref an array. You can then push on as many file names as share the same submitted date and then pop them off later when your sorting through the dates hash. Sort the dates keys and then pop the arrays to get back file names. Trust me, it works! so when you get down to:
    foreach $key (keys %include) { if ($include{$key} eq 'yes') { print "<li><a href=\"passtest.cgi?$key\">$titles{key}<br>$titles2 +{$key}<br></a><br><br>\n"; } } # try instead: foreach my $date ( sort keys %dates ) { foreach $key ( @{$dates{$date}} ) { if ($include{$key} eq 'yes') { print "<li><a href=\"passtest.cgi?$key\">$titles{key}<br>$titles +2{$key}<br></a><br><br>\n"; } # if include eq yes } # foreach key @dates } # foreach date dates
    If the original submitted: date was in epoch time or whatever it is.
    More cleanup: you probably want to put a $title2{$FILE} = $string3; in the "if ( $string =~ /<title ... " block or rather, take it out of the else and always "$title2{$FILE} = $string3;" Though now now you can (inside the foreach loops)
    $pretty_date = scalar localtime $date;
    and replace $titles2{$key} w/ $pretty_date and skip %titles2 altogether.
    You can probably skip the '$include{$FILE} = "no";' part, as you never use these files, why put them in %include at all. That'll save you the 'eq "yes"' test later. Also 'print ", " unless $i == @terms' is a bit more perlish, though if you used:
    while ($term = shift @terms ) { print "$term"; print ", " if @terms; }
    is even uh, er, cooler. pointy hat, I'm sure, knows 3 better ways ;->.
    HTH

    a

      a, thank you. i haven't learned how to use this site real well yet, so didn't get my thanks to you. i appreciate your help. the sort problem was interrupted. now i'm trying to make sense of this site so i can start writing for it. it's a web of cgi's - nothing static. later, malaga