Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This is a two part question.

I want to submit to 9,000 forms (all across one domain) and it's not spamming, just so you know.

Question 1: How do I LOG IN and STORE THE COOKIES so I can access these 9,000 pages? I know some of you will post links to read a FAQ or something but I've tried doing this a number of times in the past with separate scripts and never have I successfully been able to scrape pages OR submit data to pages that required a login first.

Question 2: Since this is 9,000 pages and I don't know fork(), there's no way this as a CGI script could execute without timing out. I don't think the script would be THAT fast!

So this question is, I think breaking it into three scripts would be best. 1 login form which sends data to the other two scripts. One script submits the data to the first half of 9,000 pages and the second would submit to the other half of 9,000. The only problem with this is I wouldn't be able to see live updates as to which page it's up to if it's two scripts.

So any ideas on how this would work? It doesn't necessarily have to print out a message for each submission, maybe for each 10 or 100 just so you know it's still running.

Any help with the cookies and ideas with question #2 would be very helpful!

Replies are listed 'Best First'.
Re: automated form processing with cookies
by merlyn (Sage) on Jun 08, 2005 at 19:18 UTC
    I want to submit to 9,000 forms (all across one domain) and it's not spamming, just so you know.
    My spider sense is tingling.

    Anything that would require that many form submissions would be better done by eliminating the web element and going directly to the resulting database. Have you tried just modifying the database, presuming it's yours? And if it's not yours, why the hell are you submitting 9000 forms? Beside, Perl can only handle 8192 forms in one session.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Actually this isn't mine, it's for a client who wants this done. I'm pretty sure this is his site but he/she said it's not spam and that's enough for me.

      That is interesting that Perl can handle only 8192 forms in one session. That doesn't make so much sense to me why there would be a limitation, but it's not like it's a common thing to submit to this many anyway. What could be done as a workaround is submit to the first script then link to the second one after the first half is completed. The second script would initiate the login procedure again and complete the rest.

        Then if it's their site tell them that this is a really brane dead way to go about doing things and that they'd be much better served by using approach Y rather than BFI approach X. If you get the job done for them better / faster / more efficiently (and it's really legit) they'll be more likely to use your services again.

        --
        We're looking for people in ATL

        "...but he/she said it's not spam and that's enough for me."

        The quote below came to mind right after reading your comment...

        "...then I ram my ovipositer down your throat and lay my eggs in your chest - But I'm not an alien!" -Tom Servo (MST3K: The Movie)

        I'm somewhat deranged, though, so you're probably fine taking this person's word for it.

Re: automated form processing with cookies
by cmeyer (Pilgrim) on Jun 08, 2005 at 19:18 UTC

    Your post is a little vague. I'm left wondering what exactly are you trying to accomplish, and why are you doing it from a CGI script?

    When asking questions about programming, be prepared to learn something. Sometimes the best answer to a question like your number two is "learn fork()".

    LWP::Parallel may help you. You'll need to be familiar with the use of LWP, for sending cookies, form POSTs and the like.

    -Colin.

    WHITEPAGES.COM | INC

Re: automated form processing with cookies
by Anonymous Monk on Jun 08, 2005 at 21:46 UTC
    No no you should be using Cookie::Jar this will allow you to store as many cookies as you like. As well it has some functions that you really can not do without like dont_get_caught and get_oreos_only.