in reply to •Re: Creating loop on undefined hash key value
in thread Creating loop on undefined hash key value

Quick background: I'm building a simple site index builder (no 'use lingua'). Maybe this time I'll actually finish it.

I am familiar (a little) with Robot and SimpleRobot but they do not do dynamic, script generated pages (to my knowledge and pass experiences) which is how I need it to work.

Robot and SimpleRobot will not read a page with a URL like:

http://www.someserver.com/cgi-bin/template.pl?content=foo.htm

I'm working with a subset of code written by Rob_au from his SiteRobot.pm (you've seen it before). His code works great (returns script URLs) but passes only a single dimensioned array of the page URLs...I wanted to build on this. I wanted to be able to retrieve more things like Title, Body, creation date, etc. Rob's code already retrieves this information but uses the information for validity checking and then throws it to the wind.

My twist was to use a hash instead of his array since no dulpilcate key data can be created...therefore the listing of URLs would be unique. I thought about doing a AoH but that's too messy (I'd have to build in duplicate checking, code to pull the hashes out of the array, etc.). Ala K.I.S.S.

So now that you see my quandry a bit more clearly, is there any more information that you can provide.

TIA

======================
Sean Shrum
http://www.shrum.net

Replies are listed 'Best First'.
Re: Re: •Re: Creating loop on undefined hash key value
by theorbtwo (Prior) on Nov 23, 2002 at 18:33 UTC

    It shouldn't be difficult to modify Robot or SimpleRobot to /not/ filter out GET-prametered URLs. Be careful, though, as there are some times you definatly /don't/ want to follow such links, such as when they cause voting, etc, to occour.

    There's probably a line that searches for a ? in the url, and rejects it. It's probably even commented.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).