in reply to News alerts using the BBC's news ticker data file

Hi,

I thought I would brush up my rusty Parse::RecDescent skills (to be honest they were pretty non-existent to begin with) to see if I could parse the file in another way. Also, with the help of Mr. Muskrat's node Read "We're Going on a Bear Hunt" Out Loud, I thought I would create a virtual Brian Perkins. It is almost as good as the real one but with a slight American accent.

#perl -w use strict; use warnings; use Parse::RecDescent; use Win32::OLE; use LWP::Simple; my ($objParser, $szData, $ptr, $szThisCat, $szThisHeadline); #Configuration my $ticker_data_url = 'http://tickers.bbc.co.uk/tickerdata/story2.dat' +; my @readOrder = qw/WORLD UK SCI-TECH BUSINESS FINANCE SPORTS WEATHER/; #Build the parser $objParser = new Parse::RecDescent ( join ('', <DATA>) ); die "Bad grammar!\n" if not defined $objParser; #Download the ticker data file from the BBC $szData = get $ticker_data_url; die "Couldn't retrive ticker data" if not defined $szData; #Parse $ptr = $objParser->BBCFILE(\$szData); die "Couldn't parse file!\nThis text was left:\n$szData" if not defined $ptr; #Build the voice my $voice; $voice = Win32::OLE->new("Speech.VoiceText") or die("TTS failed"); $voice->Register("", "$0"); $voice->{Enabled} = 1; $voice->{Speed} = 220; #Read the stories foreach $szThisCat ( @readOrder ) { #Read the category print $szThisCat, "\n"; talk("In $szThisCat news"); #Read each of the stories foreach $szThisHeadline ( keys %{$ptr->{$szThisCat}} ) { print "\t", $szThisHeadline, "\n"; $szThisHeadline =~ s/FTSE/footsee/o if $szThisCat eq 'FINAN +CE'; talk($szThisHeadline); } } sub talk{ my $line = shift; $voice->Speak($line, 1); while ($voice->IsSpeaking()) { sleep 1; } } __DATA__ #Parse::RecDescent grammar for BBC ticker file #Start up actions { my %category = (); my $szThisSection = '**Unknown**'; } BBCFILE: FILE_HEADER LAST_UPDATE SECTION(s) EOFILE { $return = \%category; } FILE_HEADER: 'BBCONLINE:LIVE' '15' 'REFRESH REV5' 'VERSION_WIN32 1.0.1.1' 'VERSION_WIN16 1.0.0.10' LAST_UPDATE: 'STORY' NUMBER 'HEADLINE' 'Last update at' TIME 'URL' SECTION: 'STORY' NUMBER 'HEADLINE' SECTION_TYPE { $szThisSection = $item{SECTION_TYPE}; } SECTION_TYPE: 'WORLD' 'NEWS' DATE 'URL' {$return = $item[1]} | 'UK' 'NEWS' DATE 'URL' {$return = $item[1]} | 'SPORTS' 'NEWS' DATE 'URL' {$return = $item[1]} | 'BUSINESS' 'NEWS' DATE 'URL' {$return = $item[1]} | 'SCI-TECH' 'NEWS' DATE 'URL' {$return = $item[1]} | 'WEATHER' DATE 'URL' {$return = $item[1]} | 'TRAVEL' 'NEWS' 'URL' {$return = $item[1]} | 'FINANCE' DATE 'URL' {$return = $item[1]} | HEADLINE URL { $category{$szThisSection}{$item{HEADLINE}} = $item{URL}; $return = $szThisSection; } URL: /[^\n]+/ { $item[1] =~ s/^URL\s+//o; $item[1] = 'N/A' if $item[1] eq ''; $return = $item[1]; } NUMBER : /[0-9]+/ TIME : /[0-9]{2}:[0-9]{2}/ DATE : /[0-9]{1,2} [A-Z][a-z]+ [0-9]{4}/ HEADLINE: /[^\n]+/ EOFILE : /^\Z/

Regards,
Dom.

Updates:

Replies are listed 'Best First'.
Re: Re: News alerts using the BBC's news ticker data file
by bfdi533 (Friar) on Apr 01, 2003 at 18:06 UTC
    Tried this out yesterday and thought it was the bomb! Great job.
    But I tried it out today and found that it returned nothing but the categories. Checking the grammar I found that the DATE specification failes on dates like "1 April" as it is only a 1-digit day, rather than as expected in the grammar as "01 April".
    If the DATE is changed to "DATE: /0-9+ A-Za-z+ 0-9{4}/" then it works again.

    Ed

      Many thanks bfdi533. Just goes to show how rusty my grammar writing skills were/are. I've made the correction to the original node.

      Regards,
      Dom.

      PS: You may be wondering where your [ and ] characters have gone and why have strange hyperlinks appeared? The answer is that those characters are used to create links within the site. This link, although outdated, has more information.

        Yes, I was wondering about that. Thanks for the link; I now know the "error of my ways" in that last posting.

        BTW, I think your grammar solution with [0-9]{1,2} is much more elegant than my [0-9]+.

        Ed