in reply to Importing into Database

This is exactly the kind of task that Perl excels at. One of Perl's strengths is CPAN which is a library of pre-written code that you're welcome to use for your own purposes. Some modules that you'll find particularly useful for this purpose are HTML::Parser (for extracting data from an HTML file) and the combination of DBI and DBD::mysql for talking to a MySQL database.

--
<http://www.dave.org.uk>

"The first rule of Perl club is you don't talk about Perl club."

Replies are listed 'Best First'.
HTML::Parser Alternative
by Anonymous Monk on Nov 24, 2001 at 11:43 UTC
    sidenote: if you're using a system that has Lynx installed, you can use it as a quick-and-dirty substitute for HTML::Parser. using open's slurp-output-from-a-command feature, and lynx's "-dump" (iirc) switch, you can get a preparsed representation of the page as it would look on your console (i.e. as lynx would lay it out). This can be munged using normal means; if your html looks fairly simple when rendered*,this might be a win in terms of programming complexity.

    As an anecdotal usage example, I used this approach at one point to write a "screen scraper" program to pull tens of thousands of books' amazon sales ranks to stick them into a database for analysis. Their html code was fairly grotty, probably to try to prevent this sort of automated digging, but it had to look simple to a human being. In the lynx-parsed output it boiled down to one line that looked like "rank: foo" which was trivial to find/extract information from.

    HTH. :-)

    * ... and the information that you're interested in is rendered as opposed to being in the tag structure somehow. if you care about what's in the tags, it's time to fire up the Beast that is HTML::Parser...

      That sounds like a terrible idea to me. All you'll get back from lynx -dump is plain text. There will no structure in it at all. I'd guess that can only make it much harder to parse the data that you want out of it.

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you don't talk about Perl club."