in reply to I need speed...
I will only touch on Q1 since I have no experience with what you want to do in Q2.. however it might well turn out that you don't need to go to such superb lengths of twisting your arms as your Q2 suggests, if the following notes work well enough for you, and I believe you'll see a huge difference.
First of all, a few bad mistakes in your code. :-( The biggest one being that you try to emulate a hash's functionality by using a string you run a regex over, instead of just using a hash. Rely on Perl to do the work for you; hash lookup is very fast and efficient and will do the job better than any crutch you may be able to think of.
Sidenotes:local $_; # should always do this unless you know why you don't want i +t my %id_cache; open (IMAGE_LIST, $image_list); # you didn't need quotes here.. while(<IMAGE_LIST>) { $id_cache{substr($_,1)} = substr($_,0,1) eq 'Y'; } close (IMAGE_LIST); open (IMAGE_LIST, ">>$image_list"); # although you do here for (@thumb_id_list) { unless(exists $id_cache[$_]) { # IDs that didn't appear in the file won't exist in the hash # therefor we have to connect to partner website for them $id_cache[$_] = &check_image($session,$_); print IMAGE_LIST $id_cache[$_] ? "Y$_\n" : "N$_\n"; } # $id_cache[$_] is now defined, # regardless of whether it was in the file or not push (@display_thumbs,$_) if $id_cache[$_]; } close (IMAGE_LIST);
Now that's much shorter and clearer, no? It will also run a lot faster - even if from an algorithmic point of view it is still anything but satisfactory. (Note: you must take proper care of file locking, or you'll end up corrupting your image list file.) Since your IDs appear to be of fixed length, I could go into all sorts of optimizations possible with that knowledge, here, first and foremost being to store the keys in a binary file to get rid of the overhead of reading and parsing a textfile for every invocation.
However, I won't. Because I think you're stepping down the wrong road. You will have a problem with that approach very soon because you're slurping the entire list into memory from scratch every time. That will become slow very soon, but the killer argument is that your memory usage will go through the roof - something you should avoid at all cost for medium or higher traffic CGIs. A partial solution to the speed problem would be to use Storable to put the %id_cache into a binary file; this will be at least an order of magnitude faster than anything you are likely to be able to code yourself.
However, there's still the problem of memory usage. It would be better if you can check your keys from disk without reading the entire list into memory. Unfortunately that requires very clever storage in order to be fast, and coming up with a good way to do this is what the people at IBM and Oracle get paid a lot of money for - it is an extraordinarily tricky task. I would advise against the method that has been mentioned by others to abuse the file system as a database, because while that works, on just about every Unix filesystem the number of files you can create is limited; subdividing the key into a path with several small bits increases the problem a lot, while on the other hand keeping it as one single filename makes for very large directories that some filesystems are pretty slow to search.
Really, you should use a database.
Also, accumulate the unknown permission images rather than calling &check_image() on each, and use LWP::Parallel to check them at once. This should save your script a couple seconds of waiting for each reply in turn; just make sure you don't totally cripple the target server with a flood of requests (if memory serves, the module lets you define how many of the requests to fire off at once).
|
|---|