Ovid has asked for the wisdom of the Perl Monks concerning the following question:
I have been asked to take a bunch of financial data that is being ftp'd to one of our servers, parse it, stuff in in a database and then build dynamic pages to serve quotes to customers that are no less than 15 minutes old. The data files are sent to our server are in CSV format. No quote marks (") exist (and therefore no problems with commas in quotes), so using split on the data should be fine.
That turned out to be overly optimistic. As it turns out, each line of the file represents one type of quote and the format and the format, while consistent for each quote type, varies from type to type. In other words, one line may have five fields and the next line may have eight. As a result, I felt that using Parse::RecDescent would be a good choice. Unfortunately, I do not know Parse::RecDescent. What follows is my first, simplistic attempt to deal with this problem (eventually, everything after the __DATA__ token will be read from a file and the parser rules will be put there).
#!/usr/bin/perl use Parse::RecDescent; use strict; use Data::Dumper; $::RD_ERRORS = 1; # Make sure the parser dies when it encounters an er +ror $::RD_WARN = 1; # Enable warnings. This will warn on unused rules &c +. $::RD_HINT = 1; # Give out hints to help fix problems. # Create and compile the source file my $rules; my $parser = Parse::RecDescent->new( q( get_type : type { $item{ type } } type : /^[^,]+/ comma : "," date : /\d\d\/{2}\d\d/ start_date : date end_date : date time : /\d\d:{2}\d\d/ rate : /\d+\.\d{4}/ start_rate : rate end_rate : rate change : rate whitespace : /\s*/ G017RATEBRKRL : type comma rate comma start_date comma end_date co +mma time { return \%item } G017CP111_D : type comma start_rate comma end_rate comma change +comma date comma time { \%item } G017RPAGO_N : type comma rate comma whitespace comma whitespace + comma date comma time { \%item } G017ONFD : type comma rate comma rate comma rate comma rate c +omma rate comma rate comma date comma time { \@item } G017PDFF : type comma rate comma rate comma rate comma rate c +omma date comma time { \@item } ) ); while ( chomp( my $quote_data = <DATA> ) ) { next if $quote_data !~ /\S/; my $quote_type = $parser->get_type( $quote_data ); next if ! defined $quote_type; $quote_type =~ s/\W/_/g; print "* $quote_type : $quote_data *\n"; if ( defined $quote_type ) { my $data = $parser->$quote_type( $quote_data ); # <-- this doe +sn't work :( if ( defined $data ) { print Dumper $data; } else { print "\$data is undefined for $quote_type\n"; } } } __DATA__ G017RATEBRKRL,4.2500,10/2/01,10/05/01,16:40:57 G017CP111 D,2.3800,2.3300,0.0001,10/05/01,16:40:55 G017RPAGO/N,2.4300, , ,10/05/01,16:40:58 G017ONFD,2.3125,2.3750,2.4375,2.3750,2.4375,2.2500,10/05/01,16:40:56 G017PDFF,2.5000,2.7500,2.2500,2.5000,10/05/01,16:40:56
The intent is to loop through the data, get the type (that worked fine), and then return a reference to the data structure. Eventually, I intend to provide handlers to automatically add the data to the database, depending upon which type is encountered.
Unfortunately, nothing is returning any data. What am I overlooking?
Another problem is more of a style issue (I think). I don't like all of those 'comma' rules in there. Is this how it's done in Parse::RecDescent or am I totally missing something?
I've been reading through a RecDescent tutorial, but don't seem to be able to parse more than simple data with this module. Further, I think that I'm probably taking the wrong approach to this, so any suggestions as to other approaches would be useful (though I'd prefer to stick with Parse::RecDescent as it would be very useful.
Cheers,
Ovid
Vote for paco!
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: A Slough of ParseRecDescent Woes
by merlyn (Sage) on Oct 09, 2001 at 03:13 UTC | |
|
Re: A Slough of ParseRecDescent Woes
by Masem (Monsignor) on Oct 09, 2001 at 02:28 UTC | |
|
Re: A Slough of ParseRecDescent Woes
by runrig (Abbot) on Oct 09, 2001 at 04:03 UTC | |
|
Re: A Slough of ParseRecDescent Woes
by belden (Friar) on Oct 09, 2001 at 04:55 UTC | |
|
Re: A Slough of ParseRecDescent Woes
by BrentDax (Hermit) on Oct 09, 2001 at 08:19 UTC | |
|
Re: A Slough of ParseRecDescent Woes
by toma (Vicar) on Oct 09, 2001 at 10:37 UTC | |
|
Re: A Slough of ParseRecDescent Woes
by ehdonhon (Curate) on Oct 09, 2001 at 05:57 UTC | |
by Masem (Monsignor) on Oct 09, 2001 at 06:55 UTC |