I have been asked to take a bunch of financial data that is being ftp'd to one of our servers, parse it, stuff in in a database and then build dynamic pages to serve quotes to customers that are no less than 15 minutes old. The data files are sent to our server are in CSV format. No quote marks (") exist (and therefore no problems with commas in quotes), so using split on the data should be fine.
That turned out to be overly optimistic. As it turns out, each line of the file represents one type of quote and the format and the format, while consistent for each quote type, varies from type to type. In other words, one line may have five fields and the next line may have eight. As a result, I felt that using Parse::RecDescent would be a good choice. Unfortunately, I do not know Parse::RecDescent. What follows is my first, simplistic attempt to deal with this problem (eventually, everything after the __DATA__ token will be read from a file and the parser rules will be put there).
#!/usr/bin/perl use Parse::RecDescent; use strict; use Data::Dumper; $::RD_ERRORS = 1; # Make sure the parser dies when it encounters an er +ror $::RD_WARN = 1; # Enable warnings. This will warn on unused rules &c +. $::RD_HINT = 1; # Give out hints to help fix problems. # Create and compile the source file my $rules; my $parser = Parse::RecDescent->new( q( get_type : type { $item{ type } } type : /^[^,]+/ comma : "," date : /\d\d\/{2}\d\d/ start_date : date end_date : date time : /\d\d:{2}\d\d/ rate : /\d+\.\d{4}/ start_rate : rate end_rate : rate change : rate whitespace : /\s*/ G017RATEBRKRL : type comma rate comma start_date comma end_date co +mma time { return \%item } G017CP111_D : type comma start_rate comma end_rate comma change +comma date comma time { \%item } G017RPAGO_N : type comma rate comma whitespace comma whitespace + comma date comma time { \%item } G017ONFD : type comma rate comma rate comma rate comma rate c +omma rate comma rate comma date comma time { \@item } G017PDFF : type comma rate comma rate comma rate comma rate c +omma date comma time { \@item } ) ); while ( chomp( my $quote_data = <DATA> ) ) { next if $quote_data !~ /\S/; my $quote_type = $parser->get_type( $quote_data ); next if ! defined $quote_type; $quote_type =~ s/\W/_/g; print "* $quote_type : $quote_data *\n"; if ( defined $quote_type ) { my $data = $parser->$quote_type( $quote_data ); # <-- this doe +sn't work :( if ( defined $data ) { print Dumper $data; } else { print "\$data is undefined for $quote_type\n"; } } } __DATA__ G017RATEBRKRL,4.2500,10/2/01,10/05/01,16:40:57 G017CP111 D,2.3800,2.3300,0.0001,10/05/01,16:40:55 G017RPAGO/N,2.4300, , ,10/05/01,16:40:58 G017ONFD,2.3125,2.3750,2.4375,2.3750,2.4375,2.2500,10/05/01,16:40:56 G017PDFF,2.5000,2.7500,2.2500,2.5000,10/05/01,16:40:56
The intent is to loop through the data, get the type (that worked fine), and then return a reference to the data structure. Eventually, I intend to provide handlers to automatically add the data to the database, depending upon which type is encountered.
Unfortunately, nothing is returning any data. What am I overlooking?
Another problem is more of a style issue (I think). I don't like all of those 'comma' rules in there. Is this how it's done in Parse::RecDescent or am I totally missing something?
I've been reading through a RecDescent tutorial, but don't seem to be able to parse more than simple data with this module. Further, I think that I'm probably taking the wrong approach to this, so any suggestions as to other approaches would be useful (though I'd prefer to stick with Parse::RecDescent as it would be very useful.
Cheers,
Ovid
Vote for paco!
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
In reply to A Slough of ParseRecDescent Woes by Ovid
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |