LostS has asked for the wisdom of the Perl Monks concerning the following question:

Hey Everyone Once More,

OK I need to write a system that will go to a web site... look at all the link that follow a certian umm criteria and then go to it... grab specific data and then write that data to my DB... The site is:

http://boards.gamers.com/messages/overview.asp?name=rzooc&page=1

As you can see this is a forum... So I need to grab the messages and pretty much move it to my system. I am attempting to move to the WWWThreads System...

So Any suggestions or comments on where to start and how to go about this??



-----------------------
Billy S.
Slinar Hardtail - Guildless
Datal Ephialtes - Guildless
RallosZek.Net Admin/WebMaster
Aerynth.Net Admin/WebMaster

perl -e '$cat = "cat"; if ($cat =~ /\143\x61\x74/) { print "Its a cat! +\n"; } else { print "Thats a dog\n"; } print "\n";'

Replies are listed 'Best First'.
Re: Help Writing a System to Grab Data from a Site
by blakem (Monsignor) on Aug 29, 2001 at 01:35 UTC
    If this is an authorized activitiy, why don't you slurp it from the actual data source? (i.e. database, text files, etc) It is generally much easier to do the that, because HTML scrapers are difficult do write and tend to break easily with minor HTML variances.

    -Blake

      Well I did contact them and ask for a dump of the Forum's... They said I would have to write a script to grab the data from the web site... So I assume that is them giving me permission to do so...


      -----------------------
      Billy S.
      Slinar Hardtail - Guildless
      Datal Ephialtes - Guildless
      RallosZek.Net Admin/WebMaster
      Aerynth.Net Admin/WebMaster

      perl -e '$cat = "cat"; if ($cat =~ /\143\x61\x74/) { print "Its a cat! +\n"; } else { print "Thats a dog\n"; } print "\n";'
      Also..check out w3mir http://langfeldt.net/w3mir/
Re: Help Writing a System to Grab Data from a Site
by dga (Hermit) on Aug 29, 2001 at 01:40 UTC

    If you have to use HTML (see other reply about troubles with that) then the LWP stuff is where you want to start looking. LWP::Simple if it is well 'Simple' and the others if not.

Re: Help Writing a System to Grab Data from a Site
by Anonymous Monk on Aug 29, 2001 at 20:08 UTC
    Try using lynx from perl, such as lynx -dump -traverse http://www.url.com >> file.html
Re: Help Writing a System to Grab Data from a Site
by Amoe (Friar) on Aug 29, 2001 at 20:56 UTC
    I had to do something like this, and I used HTML::TokeParser, which I had to learn myself in my advanced state of newbieness. Luckily though, the newbies of today can use crazyinsomniac's excellent tutorial. -- sub version { print "I cuss your capitalist pig tendencies bad!"; }