in reply to Separating multiple keyword search input
Having seen a couple of your posts, I'd like to make a suggestion before I get to my main response: you will have a lot more luck getting help if you would take the time to reduce your problem to simplest possible terms. For most questions related to language usage, it is rare that you can't illustrate the problem in less than 20 lines.
Think about it: the helpful monks have only so much time to offer. Given the choice between a small, succinct tidbit and a long, incoherent, out-of-context, "I'll just dump some code into a question and someone will be able to fully grasp it and tell me what is going wrong" question... which do you think they'll answer?
There are resources that describe how to ask a question so that it will get answered; consider availing yourself of them.
Anyway. Rant concluded.
So far as I can infer from your other postings, you have two sets of data ("normal" and "premium"); you want to accept keywords off a web form, search through those sets for those keywords, then display the results (keeping the results from the two sets distinct).
It further looks like that data might be large and stored in a MySQL database. The data is also apparently structured into tab-separated fields of: unknown, title, description, unknown, unknown, keywords.
Without knowing your entire system (and I don't want to learn it -- I just want you to think about it), we can start biting off chunks. The two phases of processing you show above are interpreting form parameters, then searching through some result lines (which it's not clear where you got them from.)
# find_hits $file_name, @keywords; # # Returns a list of "score: line" strinsg. Example: # # If file "foo.txt" contains: # # Larry Wall, Programming Perl, Reference, x, y, perl # Peter Scott, Perl Medic, Legacy herding, z, a, perl # # A call to this function might look like this: # # my @hits = find_hits "foo.txt", "medic"; # # And it would return a one-element list: # # '32: Peter Scot...a, perl' sub find_hits( $ @ ) { my ( $file, @keywords ) = @_; # compose a regex for quick rejection of non-matching lines: my $any_keyword_re = join '|', map quotemeta $_, @keywords; $any_keyword_re = qr/$any_keyword_re/i; # and a keyword for the whole phrase my $phrase_re = join '\W+', @keywords; $phrase_re = qr/$phrase_re/i; # open input file for read open my $in, $file or croak "opening $file for read: $!"; my @rv; # for accumulating return values while ( <$in> ) { # reject lines with no matches out of hand next unless m/$any_keyword_re/; # any match at all is one point. my $score = 1; # split into fields for further scoring. my ( undef, $title, $desc, undef, undef, $keys ) = split /\t/; # title matches are worth 5 points each while ( $title =~ m/$any_keyword_re/g ) { $score += 5 } # description matches are only 1 point while ( $desc =~ m/$any_keyword_re/g ) { $score += 1 } # keyword matches are 4 points while ( $keys =~ m/$any_keyword_re/g ) { $score += 4 } # phrase matches (against entire line) are 10 points while ( m/$phrase_re/g ) { $score += 10 } # multiple matches are worth 10x the number # of keywords that matched. my $n_matches = () = m/$any_keyword_re/g; # see perlfaq4 if ( $n_matches > 1 ) { $score += 10*$n_matches } # finally, format $score and save for returning # to the caller push @rv, sprintf "%03d: %s", $score, $_; } return @rv; }
Ok, so now we address your other main issue, that of doing a substitution in your template. Since we can have a variable number of responses to each, but you only have one template variable, I'll assume that we can wedge all our answers into that one spot. We'll do this by joining together all the hits we found.
# validate our keywords. unless ( defined $fields{keywords} && fields{keywords} ne '' ) { # complain, bail out of this run. } my @keywords = split ' ', $fields{keywords}; # find actual matches in highest- to lowest-score order my @normal_hits = reverse sort find_hits "normal_list.txt", @keywords +; my @premium_hits = reverse sort find_hits "premium_list.txt", @keyword +s; # keep lists reasonable if ( @normal_hits > 100 ) { splice @normal_hits, 100 } if ( @premium_hits > 100 ) { splice @premium_hits, 100 } # now join them together for presentation: my $normal_hits = join '<br />', @normal_hits; my $premium_hits = join '<br />', @premium_hits; # finally, do the now-obvious substitution: $template =~ s/%%normalresults%%/$normal_hits/; $template =~ s/%%premiumresults%%/$premium_hits/;
Hopefully this gets you closer. There are all sorts of places that this code might need tweaking: if there are possibly a huge number of hits, you can't store them all in memory, so you'll need to keep track of just the top 100 (or whatever) the whole way. Handling special characters in the return strings for printing to HTML. How to handle more advanced CSV files (e.g. double quotes protecting a comma that is in a field.)
None of these are insurmountable, but until and unless you develop the skill to break down problems into simpler, more digestable chunks (both for asking questions and for writing the solution in the first place), no amount of cookbookery will help you. Good luck.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Separating multiple keyword search input
by Dente (Acolyte) on May 03, 2004 at 23:44 UTC | |
by tkil (Monk) on May 13, 2004 at 07:10 UTC |