in reply to Patern matching html.

Hi there, I agree with davorg. If they change the format of their page, the script will break horribly. Regardless, I wrote a test script that does the same thing as yours, pretty much. I just broke it up into smaller steps.
#!/usr/bin/perl -w use strict; use LWP::Simple; use constant URL => 'http://spanish.about.com/homework/spanish/blword. +htm'; my $today = (localtime)[3]; my $page = get(URL) or die "can't download page.\n"; # grab today's entry. my($entry) = $page =~ m/if \(day == $today\) [^\(]+\(\"([^\"\);]+)/; # remove markup. $entry =~ s/<[^>]+>//g; my($word,$def, $sentence, $trans) = $entry =~ m/([^:]+):([^\.]+\.)([^, +]+),(.*)/; print "word: $word\n"; print "definition: $def\n"; print "sentence: $sentence\n"; print "translation: $trans\n";

Replies are listed 'Best First'.
Re: Re: Patern matching html.
by zzspectrez (Hermit) on Nov 25, 2000 at 05:59 UTC

    I downloaded this code, and it did not work properly. When ran it printed the following:

    word: definition: sentence: translation:

    zzSPECTREz

      Like we discussed earlier, at least today (the 25th) they changed the format of the definition (from what I downloaded, there was no space after the colon after the word)

      I changed the code slightly, removed the hardcoded spaces in the regex.

      thanks for pointing that out :)