Random Image Grabber

bladx has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

For a personal project, I have undertaken the idea of learning how to produce a program that grabs a set number of random images from specified servers (and from random directories that may not be locked out for the public.) After it does that, I want it to output the <img> tags of each picture it found, on a plain white HTML page, as the final output.

It seems simple enough, but I would like feedback on my programming flow chart, since that way I would be able to work on the actual code part later without having to redo it over and over. (Getting it correct in less retries.)

Here is the flow chart I currently have:

Initialize variables
start a loop using the number of servers you chose to perform this operation on as the max number of times to execute loop.
Open connection with remote server.
Find a random available directory.
Search for any .gifs, .jpgs, or .pngs to possibly grab.
Randomly choose one, save that link to somewhere for use at the end of the script.
Close connection with current remote server.
Go back to #2 if there are more specified times to do this, else, keep going.
Exit program safely.
Output a plain .html file showing the results of what you had it find.

One of the reasons this project became something that is interesting to me is: at a MacHack conference one year, I heard about this program that did essentially the same thing as what I want to do, however it grabbed random images off of an airport network. Anyways, I wanted to be able to write something similar in Perl.

If there is something I am totally missing in that flow chart for what I need this simple program to do, please let me know about it. It's just the flow chart I have come up with, and hasn't been refined/added to yet by anyone other than me. I would appreciate a different perspective on it.

Thanks!

Andy Summers

Comment on Random Image Grabber Download Code

Replies are listed 'Best First'.
Re: Random Image Grabber by larsen (Parson) on Aug 12, 2001 at 16:19 UTC
Seems good to me. You could be interested in a different approach based on LWP::Parallel::UserAgent though. merlyn wrote a column about this topic.	[reply]
Re (tilly) 1: Random Image Grabber by tilly (Archbishop) on Aug 12, 2001 at 19:23 UTC
Serious advice. Skip the flowchart. In it you have steps like, "initialize variables". But if you can you don't want to have a ton of variables that you want to initialize. If you must, you must, but you would prefer not to. Instead go bottom up. Write a function that you know you will need. For instance given a server and a directory, look for images in it. You need that function. It is not hard to write. You can write it without needing any global variables. When you get done you will have cleaner code that is better factored and easier to modify. For instance you have in your flowchart big grey steps like, "Choose a random directory." How? guess whether it has something like images? Hope it doesn't have an index.html file so you can get a directory listing? Start following random pages looking for links? My guess is that once you are done you will find that that logic needs fixing... And yes, you did miss something. Something big. You are writing a robot. It is impolite in the extreme to write a robot that fails to look for robots.txt and respect what that file asks you to do. That will require significant changes to your overall logic. What changes? Don't worry about it just yet. Focus on how to use LWP first...	[reply]
Re: Re (tilly) 1: Random Image Grabber by bladx (Chaplain) on Aug 13, 2001 at 00:29 UTC
I greatly appreciate the time you spent in replying to my questions about this project, especially from tilly. Thank you for pointing out a much cleaner and more efficient way to even set up the flowchart :) I have to remember to work from the bottom up instead of the old way I learned at school, in previous years. I am somewhat nieve in the subject of robot, and completely forgot that this was actually a small robot. After I learn how to use LWP efficiently, I will work on the finer points of how a robot should behave. I believe I even read over an article on how robots should conduct their business, etc. somewhere, but I have forgotten where, I'll just search for that. Anyways, thanks for the great input, and I will begin the project asap! Andy Summers	[reply]
Re: Random Image Grabber by John M. Dlugosz (Monsignor) on Aug 13, 2001 at 08:45 UTC
I agree about the steps like "initialize variables". What variables? Everything should be encapsulated! Think "functional decomposition" and think "encapsulation". The robots file can be important. You can get stuck in a loop otherwise, chasing down dynamicly generated pages that have different URLs but are the same.	[reply]