Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

vandale.pl

by Juerd (Abbot)
on Mar 30, 2002 at 17:54 UTC ( [id://155450]=sourcecode: print w/replies, xml ) Need Help??
Category: Web Stuff
Author/Contact Info Juerd
Description: Because the popular gnuvd is broken, I made this quick hack to query the Van Dale website for dictionary lookups. It's a quick hack, so no production quality here ;) Oh, and please don't bother me with Getopt or HTML::Parser: Don't want to use Getopt because I don't like it, and can't use HTML::Parser because http://www.vandale.nl/ has a lot of broken HTML, and because regexes are easier (after all, it's a quick hack because I can't live without a Dutch dictionary).

This probably isn't of much use to foreigners :)

Update (200306081719+0200) - works with vandale.nl html updates now.
#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my (@switches, @woorden);

while (@ARGV) {
    $_ = shift;
    if (/^--$/) {
        push @woorden, @ARGV;
    } elsif (/^-/) {
        push @switches, $_;
    } else {
        push @woorden, $_;
    }
}

my $all = grep /^(?:-\w*a|--all)$/, @switches;
if (grep /^(?:-\w*h|--help)$/, @switches) {
    print qq{
        Usage: $0 [options] word ...
        
        options:
            -a  --all   List all matches
            -h  --help  Display usage information
    \n};
    exit 0;
}

for my $woord (@woorden) {
    $woord =~ s/(\W)/sprintf '%%%02x', ord $1/ge;

    my $page =
        get "http://www.vandale.nl/opzoeken/woordenboek/?zoekwoord=$wo
+ord";

    while ($page =~ s{<B><BIG>(.*?)</font>.*?((?:<DD>.*?</DD>)+)}{}si)
+ {
        my ($woord, $betekenis) = ($1, $2);
        for ($woord, $betekenis) {
            s[</dd>][\n]gi;
            s/<.*?>//g;
            s/&#180;/'/g;
            s/&#(\d+);/chr $1/ge;
        }
        $betekenis =~ s/^/  /gm;
        print "$woord\n$betekenis\n";
        last if not $all;
    }
}
Replies are listed 'Best First'.
(jeffa) Re: vandale.pl (with Getopt::Declare)
by jeffa (Bishop) on Mar 30, 2002 at 20:38 UTC
    Regarding Option parsing modules - this is not to bug you into using them, but rather an option for others to decide.

    I though to myself, "hmmmm ... let's use TheDamian's Getopt::Declare" and proceded to RTFM. I had always wanted to learn this module, and now seemed like the time.

    After about 40 minutes of racking my brain (:D) i finally came up with this:

    #!/usr/bin/perl -w use strict; use LWP::UserAgent; use Getopt::Declare; # -h, -v, --help, --version are included # and these are tabs - not spaces! my $spec = q( -a List all matches --all [ditto] ); my $args = Getopt::Declare->new($spec); my $all = $args->{'--all'} || $args->{'-a'}; for my $woord ($args->unused) { # insert for loop block innards from code above }
    But that is 40 minutes of well spent time, because now i see the power of this module. And thanks to the Von Neumann bottleneck of having to retrieve the page from the Internet, the fact that Getopt::Declare is slower than the option parsing code above is negligible.

    P.S. i also have no quandaries about using regexes to parse HTML, just as long as the coder understands how to use the CPAN HTML parsers. Sometimes using regexes really is easier. Sometimes.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      In the interest of tmtowtdi, here's an alternate Getopt::Declare scenario that sets up the $all and @words variables in action blocks. Here I decided to make the search words required (but without option description), and just allow for -a as an abbrev. of -all (instead of using the --all version):

      #!/usr/bin/perl -w use strict; use LWP::UserAgent; use Getopt::Declare; use vars qw/$all @words/; my $opts = Getopt::Declare->new(<<'EOS'); -a[ll] List all matches {$all = 1} <terms:s>... [required] {@words = @terms} EOS for my $word (@words) { # insert fetch code ... # ... last unless $all; } __END__
Re: vandale.pl
by cztmonk (Monk) on Jul 18, 2012 at 10:07 UTC

    When I use this code, there is no output...

      This post is ten years old, the code was last updated nine years ago. It's likely the site in question has changed substantially in that time.

        You are right, that was a stupid remark..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://155450]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-23 07:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found