scraping from HTTP page to MySql table

Agyeya has asked for the wisdom of the Perl Monks concerning the following question:

Hi all! I am new to perl. I want to write a perl program which can scrape information from a webpage,(say a table showing stock or share prices), and add it into proper tables(using MySql). This I want to do, every two minutes. Also if there is any change in the tables, I want to record the changes in another table. I am using Red Hat 9.2 as my operating System. Thanx in advance

Comment on scraping from HTTP page to MySql table

Replies are listed 'Best First'.
Re: scraping from HTTP page to MySql table by b10m (Vicar) on May 03, 2004 at 10:43 UTC
Hello and welcome to Perlmonks, This can certainly be done, but unfortunately, Perlmonks is not a place where people will build scripts for you. We can help you out with problems, but you'll need to actually code most of it yourself. To start, you probably want to learn more about Perl and this is a good place. Besides that, books are always good to have around and many monks would advice `Learning Perl` and `Programming Perl` (both O'Reilly books). After that, you might want to look at modules these modules: DBI LWP WWW::Mechanize HTML::TableExtract HTML::Parser HTML::Tree ... -- b10m All code is usually tested, but rarely trusted.	[reply]
Re: Re: scraping from HTTP page to MySql table by Agyeya (Hermit) on May 04, 2004 at 08:49 UTC
The page i want to scrape information from, comes up in a javascript popup window. Now how do i link to this window?	[reply]
Re: scraping from HTTP page to MySql table by matija (Priest) on May 03, 2004 at 10:41 UTC
Learn about CPAN. To fetch the web page, you could use LWP::Simple or LWP::UserAgent. To parse the page and extract the data, you might be able to use HTML::TableExtract or HTML::Parser. Once you have the data you need, you can save them to a mysql database using either Class::DBI (if you are Object Oriented) or DBD::mysql - if you like to live closer to the bare metal. (both use DBI). You have enough material now, I think. Start writing the script, and if you have problems, ask well thought out questions, and we'll help you solve them.	[reply]
Re: scraping from HTTP page to MySql table by z3d (Scribe) on May 03, 2004 at 12:55 UTC
Like the posts before me, I won't offer code, only recommendations and insight. I would start by warning you - unless you run the website you are scraping, or have an existing relationship with the owners, you may want to think twice about a direct scraping every two minutes. Not everyone appreciates having their website hit repeatedly and consistantly to scrape data. In addition to the modules already mentioned, I'd also recommend reading through past articles. I know that both perl.com and TPJ have run articles about exactly this, perl.com in the last few months (so it might still be found on their front page, not sure). "I have never written bad code. There are merely unanticipated features."	[reply]
Re: scraping from HTTP page to MySql table by Ryszard (Priest) on May 03, 2004 at 14:13 UTC
You know, its more complicated, (and More powerful IMO) but i've just learned HTML::TokeParser, which is my now preferred HTML parser. If you want some example code, check out jeffa's excellent IMDB::Movie.	[reply]
Re: scraping from HTTP page to MySql table by chanio (Priest) on May 03, 2004 at 18:21 UTC
In order to know when to re-check the site for changes, you'ld rather ask its webmaster the hours when she renews the site. You could even suggest her to publish the changes at a newsfeed site (Sourceforge has it) like syndic8 . Then to get the notice of those news(changes) you should extract an XML file called RSS or RDF that specifies what articles have changed, or simple that you should re-check the site. There are also PM to extract the RSS info from those files and even download them at a specified frequency: see RSS at CPAN. () http://www.syndic8.com/ (**)http://search.cpan.org/search?mode=dist&query=RSS {\('v')/} _`(___)' __________________________	[reply]
Re: ask the site's webmaster by Agyeya (Hermit) on May 04, 2004 at 04:49 UTC
Hi the site that i wish to be monitoring is a dynamic site. It may have details that are subject to random change. E.g consider the seat status in a train or bus. or even consider the appointment list of a doctor. Now on the site the list will be in the form of an excel table. Having fields, Patient ID, Appointment type, Appointment date, appointment time. Now suppose that a patient wants an appointment. so instead of putting him at the end of the queue, we can check the appointment list for any random cancellations, at put the patient in that slot.(this is just an example, as obviously the next patient in the queue shoukd be advanced). But considering how people have divded their own time in slots. the free time of the patient shuld match that of the vacancy in the appointment list.	[reply]

Back to Seekers of Perl Wisdom