Hello.
(I mean that with all my heart)

I have a multi-part situation. The project I am working on now has users coming to my client's web-site, connecting to a partner company's database for a search function and displaying the results on my clients web-site. We have permission to do this, but only via http. I cannot connect directly to their database, I have to emulate their search form, have the users send the values to our server, where the script then connnects to the third companies website as a normal web-client would, gets the search reslult, parses the html, and sends it back to the original users in my client's look and feel (also in a different language which is the whole purpose of this).

The problems I am facing are speed. As you can see, the server in the middle doesn't help speed the process up, but there is one more aspect... In the results from the parter's web-site, there are those that my client can leagally display, and those that they can not. The only way to know if they can display it is to follow the resulting link, and then parse THAT page for a certain phrase. I have decided to create a database (just a text file) that records the ID of all photos that are leagal, so future queries can use this rather than connecting to the parter's server again. So, in effect, the process I looks like this.

1) Original user come to client's website.
2) Original user enters a search phrase
3) query sent to our server
4) Script takes needed info from Original query
5) Script sends http query to partner's web site
6) Partner's site sends results back to our server as HTML
7) Script parses results for needed result ids
8) Script must determine individually if each result id leagally acceptable
so...
9) Script checks each result against database file of OK/NOT OK entries
10) If the resutl is not in the file... reconnect to partner server to get next page
11) Parse the resulting page looking for phrase that tells me if I can use it or not
12) Record results (OK / NOT OK) in the database file for future.
13) Add OK results to OK list
14) Finally I am ready to display the OK results to the rginal client (40+ seconds later in some cases!!!)

I think that the database file will work nicely if people often search with the same keyword (in this case there is no problem) I will also stock the database file a much as I can with common queries before going live, but I really want to make the parsing of this file as efficient as possible, as it could grow to 100,000 or more entries, and until it reaches that stage, any results NOT in the file must be looked up via http, and this takes valuable time. Once it reaches that stage, I am afraid that my parsing method will be slow... Anyplace I can cut out even a second will be tremendously helpful.

Currently, the dbfile ($approve) has entires like follow:

Y12398983981293\n Y23981098310983\n N98230498209480\n Y23487289374987\n

Where Y means (Y)es, ok to use, and N is (N)o. The number is then the result id from the partner site.

Then this snippet:
open (IMAGE_LIST, "$image_list"); while(<IMAGE_LIST>) { if ($_ =~ /^Y/) { $approve = "$approve$_"; } if ($_ =~ /^N/) { $condem = "$condem$_"; } } close (IMAGE_LIST); open (IMAGE_LIST, ">>$image_list"); my $full_image; for ($a = 0; $a < $thumb_count; $a++) { if ($approve =~ /Y$thumb_id_list[$a]/) { print "already had $thumb_id_list[$a] on record!!!<BR>"; push (@display_thumbs,$thumb_id_list[$a]); } if (($approve !~ /Y$thumb_id_list[$a]/) && ($condem !~ /N$thumb_id +_list[$a]/)) { #This connects to the partner website to chack. my $confirmed_thumb = &check_image($session,$thumb_id_list[$a] +); if ($confirmed_thumb) { print IMAGE_LIST "Y$confirmed_thumb\n"; push (@display_thumbs,$thumb_id_list[$a]); } if (!$confirmed_thumb) { print IMAGE_LIST "N$thumb_id_list[$a]\n"; } } } close (IMAGE_LIST);

Is there a better way to do it... like loose the line breaks or a better mathing exp? The file will grow quite large, and saving a second or two will really help

One function I would like to add, is that while the top page is being displayed, the script will be gathering further results via http in the back-ground, so when the user hits the next page button, the results are already prepared. What is the best / safest way to make a proceess run in the back, where the orginal process doesn't wait for the back process to finish? Even if the user never hits the "next" button, I will have stocked valuable info in the database file, so I would like to do this.

In summary, my questions are...
1) If I want to save a second or two on the parsing of the datbase file, what would be the best format to write this file, and parse it?
2) Any advice or clues about the best way to start the backgound process while the original CGI process is ale to finish independantly.

Any suggestions regarding any aspect of this would be appreciated. If I am way offbase with my method above... please let me know... I still have time to make a total turnaround.

Kbeen.


In reply to I need speed... by kbeen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.