kbeen has asked for the wisdom of the Perl Monks concerning the following question:
I have a multi-part situation. The project I am working on now has users coming to my client's web-site, connecting to a partner company's database for a search function and displaying the results on my clients web-site. We have permission to do this, but only via http. I cannot connect directly to their database, I have to emulate their search form, have the users send the values to our server, where the script then connnects to the third companies website as a normal web-client would, gets the search reslult, parses the html, and sends it back to the original users in my client's look and feel (also in a different language which is the whole purpose of this).
The problems I am facing are speed. As you can see, the server in the middle doesn't help speed the process up, but there is one more aspect... In the results from the parter's web-site, there are those that my client can leagally display, and those that they can not. The only way to know if they can display it is to follow the resulting link, and then parse THAT page for a certain phrase. I have decided to create a database (just a text file) that records the ID of all photos that are leagal, so future queries can use this rather than connecting to the parter's server again. So, in effect, the process I looks like this.
1) Original user come to client's website.
2) Original user enters a search phrase
3) query sent to our server
4) Script takes needed info from Original query
5) Script sends http query to partner's web site
6) Partner's site sends results back to our server as HTML
7) Script parses results for needed result ids
8) Script must determine individually if each result id leagally acceptable
so...
9) Script checks each result against database file of OK/NOT OK entries
10) If the resutl is not in the file... reconnect to partner server to get next page
11) Parse the resulting page looking for phrase that tells me if I can use it or not
12) Record results (OK / NOT OK) in the database file for future.
13) Add OK results to OK list
14) Finally I am ready to display the OK results to the rginal client (40+ seconds later in some cases!!!)
I think that the database file will work nicely if people often search with the same keyword (in this case there is no problem) I will also stock the database file a much as I can with common queries before going live, but I really want to make the parsing of this file as efficient as possible, as it could grow to 100,000 or more entries, and until it reaches that stage, any results NOT in the file must be looked up via http, and this takes valuable time. Once it reaches that stage, I am afraid that my parsing method will be slow... Anyplace I can cut out even a second will be tremendously helpful.
Currently, the dbfile ($approve) has entires like follow:
Y12398983981293\n Y23981098310983\n N98230498209480\n Y23487289374987\n
Where Y means (Y)es, ok to use, and N is (N)o. The number is then the result id from the partner site.
Then this snippet:open (IMAGE_LIST, "$image_list"); while(<IMAGE_LIST>) { if ($_ =~ /^Y/) { $approve = "$approve$_"; } if ($_ =~ /^N/) { $condem = "$condem$_"; } } close (IMAGE_LIST); open (IMAGE_LIST, ">>$image_list"); my $full_image; for ($a = 0; $a < $thumb_count; $a++) { if ($approve =~ /Y$thumb_id_list[$a]/) { print "already had $thumb_id_list[$a] on record!!!<BR>"; push (@display_thumbs,$thumb_id_list[$a]); } if (($approve !~ /Y$thumb_id_list[$a]/) && ($condem !~ /N$thumb_id +_list[$a]/)) { #This connects to the partner website to chack. my $confirmed_thumb = &check_image($session,$thumb_id_list[$a] +); if ($confirmed_thumb) { print IMAGE_LIST "Y$confirmed_thumb\n"; push (@display_thumbs,$thumb_id_list[$a]); } if (!$confirmed_thumb) { print IMAGE_LIST "N$thumb_id_list[$a]\n"; } } } close (IMAGE_LIST);
Is there a better way to do it... like loose the line breaks or a better mathing exp? The file will grow quite large, and saving a second or two will really help
One function I would like to add, is that while the top page is being displayed, the script will be gathering further results via http in the back-ground, so when the user hits the next page button, the results are already prepared. What is the best / safest way to make a proceess run in the back, where the orginal process doesn't wait for the back process to finish? Even if the user never hits the "next" button, I will have stocked valuable info in the database file, so I would like to do this.
In summary, my questions are...
1) If I want to save a second or two on the parsing of the datbase file, what would be the best format to write this file, and parse it?
2) Any advice or clues about the best way to start the backgound process while the original CGI process is ale to finish independantly.
Any suggestions regarding any aspect of this would be appreciated. If I am way offbase with my method above... please let me know... I still have time to make a total turnaround.
Kbeen.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: I need speed...
by dirthurts (Hermit) on Oct 07, 2001 at 09:33 UTC | |
by kbeen (Sexton) on Oct 07, 2001 at 10:26 UTC | |
by tstock (Curate) on Oct 07, 2001 at 10:54 UTC | |
|
Re: I need speed...
by blakem (Monsignor) on Oct 07, 2001 at 11:25 UTC | |
by Aristotle (Chancellor) on Oct 07, 2001 at 13:19 UTC | |
by blakem (Monsignor) on Oct 07, 2001 at 13:40 UTC | |
by perrin (Chancellor) on Oct 08, 2001 at 00:27 UTC | |
by Aristotle (Chancellor) on Oct 08, 2001 at 02:51 UTC | |
| |
|
Re: I need speed...
by tstock (Curate) on Oct 07, 2001 at 10:46 UTC | |
|
Re: I need speed...
by pjf (Curate) on Oct 07, 2001 at 13:40 UTC | |
|
Re: I need speed...
by Aristotle (Chancellor) on Oct 07, 2001 at 14:38 UTC | |
|
Re: I need speed...
by theorbtwo (Prior) on Oct 07, 2001 at 11:03 UTC | |
by perrin (Chancellor) on Oct 08, 2001 at 03:20 UTC | |
|
Re: I need speed...
by kbeen (Sexton) on Oct 07, 2001 at 10:39 UTC | |
by tstock (Curate) on Oct 07, 2001 at 11:20 UTC | |
|
Re: I need speed...
by ralphie (Friar) on Oct 07, 2001 at 20:53 UTC |