First, sorry for the long post and clueless nature.

I have set myself a task to create a script that can collect data from web pages and insert them into a MySQL database. I'm a complete noob at this though and not even sure what language I need (to learn), but think perl might be it. What I ask now is not for you to tell me whow to do it, only if it's feasible or if I'm barking up the wrong tree (pointers on where to find relevant information is wellcome though.

First step would be to export a list of pids to be processed, each paired with the last sid processed for the pid.
The script would read the list and set the first pid in list as current.
Next step would be for it to add current pid to a URL and load that page containing a list.
From this page a list of sids needs to be collected untill I hit the "last processed" one, these might be spread over several pages so it need to keep going either until it finds "last processed" or there's no further pages to load (a fail I guess).

Next is the new sid list created in the previous step, each one need to be processed and data collected some basic data is collected frrom each sid and then 2 possible (but not always excistant) lists.
The basic data collected for the sid cotains two values to be set as variables, these decides how many data blocks needs to be collected lower down on the page.
Go to first type block, collect the data I want and repeat as many times as variable says.
Go to second type block and repeat.

Store the data collected from previous in a textfile named after pid, it should contain 4 sections of data to be inserted into 4 databases.

First section update the pid with new last processed.
Second section add sids with info to DB.
Third section add the data from type 1 blocks on sid pages to DB.
Fourth section section add the data from type 2 blocks on sid pages to DB.

Close the file, load next pid from list and repeat the process until pid list is empty.

A guess a bonus at the end would be if it could also insert all the data collected into the db as well.

Is this something perl would be suitable for or is there a better choise?
My system is Win 7 64bit btw, running MySQL 5.1 and strawberry perl 5.12.

TIA
Thomas

In reply to collect data from web pages and insert into mysql by SteinerKD

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.