This is exactly the kind of task that Perl excels at.
One of Perl's strengths is CPAN which
is a library of pre-written code that you're welcome to
use for your own purposes. Some modules that you'll find
particularly useful for this purpose are
HTML::Parser (for extracting data from an HTML
file) and the combination of DBI and
DBD::mysql for talking to a MySQL database.
--
<http://www.dave.org.uk>
"The first rule of Perl club is you don't talk about
Perl club."
| [reply] |
sidenote: if you're using a system that has Lynx installed, you can use it as a quick-and-dirty substitute for HTML::Parser. using open's slurp-output-from-a-command feature, and lynx's "-dump" (iirc) switch, you can get a preparsed representation of the page as it would look on your console (i.e. as lynx would lay it out). This can be munged using normal means; if your html looks fairly simple when rendered*,this might be a win in terms of programming complexity. As an anecdotal usage example, I used this approach at one point to write a "screen scraper" program to pull tens of thousands of books' amazon sales ranks to stick them into a database for analysis. Their html code was fairly grotty, probably to try to prevent this sort of automated digging, but it had to look simple to a human being. In the lynx-parsed output it boiled down to one line that looked like "rank: foo" which was trivial to find/extract information from. HTH. :-) * ... and the information that you're interested in is rendered as opposed to being in the tag structure somehow. if you care about what's in the tags, it's time to fire up the Beast that is HTML::Parser...
| [reply] |
That sounds like a terrible idea to me. All you'll get
back from lynx -dump is plain text. There will
no structure in it at all. I'd guess that can only make it
much harder to parse the data that you want out of it.
--
<http://www.dave.org.uk>
"The first rule of Perl club is you don't talk about
Perl club."
| [reply] |
It sounds like a perfect project to learn Perl with. You can do what you want in well under thirty lines of code. You don't mention if you know *any* languages. If you do you can learn enough Perl to do it very quickly. If you don't it will take a little longer to wrap you head around the basic programming concepts but you should still be able to easily achieve your goal.
The recommended book for learning Perl is called....wait for it....Learning Perl by our own merlyn and would be a good place to start. The payoff is a very versatile tool that would let you actually do something with that data once you have it in the database ;-)
If you don't want to learn Perl there are plenty of monks here who could write it for you (for a fee) but then you would miss a lot of fun. If you put in a little effort you will find that we will be happy to help with any problems you strike along the way.
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] |
It's possible. I'd even say relatively easy.
It's impossible to program in any(?) programming language unless you know something about the language syntax. The more you know, the better. A Master would of course be able to do 'magic'.
My personal experience is that Perl has a rather steep learning curve, but a very generous reward.
f--k the world!!!!
/dev/world has reached maximal mount count, check forced.
| [reply] |
I was very endeared by the responses to this quesuestion (and a little by the question) and I shout at the top of my lungs:
good readin', so go read
after reading perldata and perlsyn in the library (among other things) and playing with stuff, you can go to tutorials, make sure you got the perl basics, and then read the tutorial on DBI and perhaps even HTML::TokeParser (the 2 tools you'll most likely need). And before you even begin writing code, use strict and -w. Here is starting_point.pl, which you execute by typing perl starting_point.pl #!/usr/bin/perl -w
use strict;
my $variable = "is strict compliant";
print "My \$variable ", $variable;
print qq'Hello world\n';
___crazyinsomniac_______________________________________
Disclaimer: Don't blame. It came from inside the void
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;" | [reply] [d/l] |
Hi Aszl826,
I notice that noone is actually providing you with code for
your question. Don't consider this too bad a thing. Anyway,
if you want to read some reasonable introductory notes for Perl
check out pjf's node. These notes are writen for 1 day
courses and assume that you are familiar with some kind of
programming language and programing paradigms (like conditionals
and looping constructs). The intermediate course notes have
a chapter on DBI in the back too.
Good luck. | [reply] |