aszl826 has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I have a question I hope you might help me solve. Anyway, I have a set of HTML files like this: 1.html 2.html 3.html ... And each file has 3 sections, Title, author, and Body Text. Now my question is, is it easy to import these files into a MySQL database with the corresponding 3 fields with a Perl script? Since I can't think of any other language that would be easier. However, since I don't actually *know* Perl, I'd have to learn it. Any advice would be appreciated!

Replies are listed 'Best First'.
Re: Importing into Database
by davorg (Chancellor) on Nov 22, 2001 at 18:18 UTC

    This is exactly the kind of task that Perl excels at. One of Perl's strengths is CPAN which is a library of pre-written code that you're welcome to use for your own purposes. Some modules that you'll find particularly useful for this purpose are HTML::Parser (for extracting data from an HTML file) and the combination of DBI and DBD::mysql for talking to a MySQL database.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you don't talk about Perl club."

      sidenote: if you're using a system that has Lynx installed, you can use it as a quick-and-dirty substitute for HTML::Parser. using open's slurp-output-from-a-command feature, and lynx's "-dump" (iirc) switch, you can get a preparsed representation of the page as it would look on your console (i.e. as lynx would lay it out). This can be munged using normal means; if your html looks fairly simple when rendered*,this might be a win in terms of programming complexity.

      As an anecdotal usage example, I used this approach at one point to write a "screen scraper" program to pull tens of thousands of books' amazon sales ranks to stick them into a database for analysis. Their html code was fairly grotty, probably to try to prevent this sort of automated digging, but it had to look simple to a human being. In the lynx-parsed output it boiled down to one line that looked like "rank: foo" which was trivial to find/extract information from.

      HTH. :-)

      * ... and the information that you're interested in is rendered as opposed to being in the tag structure somehow. if you care about what's in the tags, it's time to fire up the Beast that is HTML::Parser...

        That sounds like a terrible idea to me. All you'll get back from lynx -dump is plain text. There will no structure in it at all. I'd guess that can only make it much harder to parse the data that you want out of it.

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you don't talk about Perl club."

Re: Importing into Database
by tachyon (Chancellor) on Nov 22, 2001 at 20:16 UTC

    It sounds like a perfect project to learn Perl with. You can do what you want in well under thirty lines of code. You don't mention if you know *any* languages. If you do you can learn enough Perl to do it very quickly. If you don't it will take a little longer to wrap you head around the basic programming concepts but you should still be able to easily achieve your goal.

    The recommended book for learning Perl is called....wait for it....Learning Perl by our own merlyn and would be a good place to start. The payoff is a very versatile tool that would let you actually do something with that data once you have it in the database ;-)

    If you don't want to learn Perl there are plenty of monks here who could write it for you (for a fee) but then you would miss a lot of fun. If you put in a little effort you will find that we will be happy to help with any problems you strike along the way.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Importing into Database
by Biker (Priest) on Nov 22, 2001 at 18:02 UTC

    It's possible. I'd even say relatively easy.

    It's impossible to program in any(?) programming language unless you know something about the language syntax. The more you know, the better. A Master would of course be able to do 'magic'.

    My personal experience is that Perl has a rather steep learning curve, but a very generous reward.

    f--k the world!!!!
    /dev/world has reached maximal mount count, check forced.

(crazyinsomniac) Re: Importing into Database
by crazyinsomniac (Prior) on Nov 23, 2001 at 09:44 UTC
    I was very endeared by the responses to this quesuestion (and a little by the question) and I shout at the top of my lungs:

    Tutorials

    Library

    good readin', so go read

    after reading perldata and perlsyn in the library (among other things) and playing with stuff, you can go to tutorials, make sure you got the perl basics, and then read the tutorial on DBI and perhaps even HTML::TokeParser (the 2 tools you'll most likely need). And before you even begin writing code, use strict and -w. Here is starting_point.pl, which you execute by typing perl starting_point.pl
    #!/usr/bin/perl -w use strict; my $variable = "is strict compliant"; print "My \$variable ", $variable; print qq'Hello world\n';

     
    ___crazyinsomniac_______________________________________
    Disclaimer: Don't blame. It came from inside the void

    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

Re: Importing into Database
by jarich (Curate) on Nov 24, 2001 at 10:57 UTC
    Hi Aszl826,

    I notice that noone is actually providing you with code for your question. Don't consider this too bad a thing. Anyway, if you want to read some reasonable introductory notes for Perl check out pjf's node. These notes are writen for 1 day courses and assume that you are familiar with some kind of programming language and programing paradigms (like conditionals and looping constructs). The intermediate course notes have a chapter on DBI in the back too.

    Good luck.