Re: Patern matching html.

Hi there, I agree with davorg. If they change the format of their page, the script will break horribly. Regardless, I wrote a test script that does the same thing as yours, pretty much. I just broke it up into smaller steps.

#!/usr/bin/perl -w

use strict;
use LWP::Simple;

use constant URL => 'http://spanish.about.com/homework/spanish/blword.
+htm';

my $today = (localtime)[3];
my $page = get(URL) or die "can't download page.\n";

# grab today's entry.
my($entry) = $page =~ m/if \(day == $today\) [^\(]+\(\"([^\"\);]+)/;

# remove markup.
$entry =~ s/<[^>]+>//g;

my($word,$def, $sentence, $trans) = $entry =~ m/([^:]+):([^\.]+\.)([^,
+]+),(.*)/;

print "word: $word\n";
print "definition: $def\n";
print "sentence: $sentence\n";
print "translation: $trans\n";
[download]

Comment on Re: Patern matching html. Download Code

Replies are listed 'Best First'.
Re: Re: Patern matching html. by zzspectrez (Hermit) on Nov 25, 2000 at 05:59 UTC
I downloaded this code, and it did not work properly. When ran it printed the following: `word: definition: sentence: translation:` [download] zzSPECTREz	[reply] [d/l]
Re: Re: Re: Patern matching html. by rpc (Monk) on Nov 26, 2000 at 02:19 UTC
Like we discussed earlier, at least today (the 25th) they changed the format of the definition (from what I downloaded, there was no space after the colon after the word) I changed the code slightly, removed the hardcoded spaces in the regex. thanks for pointing that out :)	[reply]