comment on

I have been asked to take a bunch of financial data that is being ftp'd to one of our servers, parse it, stuff in in a database and then build dynamic pages to serve quotes to customers that are no less than 15 minutes old. The data files are sent to our server are in CSV format. No quote marks (") exist (and therefore no problems with commas in quotes), so using split on the data should be fine.

That turned out to be overly optimistic. As it turns out, each line of the file represents one type of quote and the format and the format, while consistent for each quote type, varies from type to type. In other words, one line may have five fields and the next line may have eight. As a result, I felt that using Parse::RecDescent would be a good choice. Unfortunately, I do not know Parse::RecDescent. What follows is my first, simplistic attempt to deal with this problem (eventually, everything after the __DATA__ token will be read from a file and the parser rules will be put there).

#!/usr/bin/perl
use Parse::RecDescent;
use strict;
use Data::Dumper;

$::RD_ERRORS = 1; # Make sure the parser dies when it encounters an er
+ror
$::RD_WARN   = 1; # Enable warnings. This will warn on unused rules &c
+.
$::RD_HINT   = 1; # Give out hints to help fix problems.

# Create and compile the source file
my $rules;
my $parser = Parse::RecDescent->new( q(
    get_type   : type { $item{ type } }
    type       : /^[^,]+/
    comma      : ","
    date       : /\d\d\/{2}\d\d/
    start_date : date
    end_date   : date
    time       : /\d\d:{2}\d\d/
    rate       : /\d+\.\d{4}/
    start_rate : rate
    end_rate   : rate
    change     : rate
    whitespace : /\s*/
    
    G017RATEBRKRL : type comma rate comma start_date comma end_date co
+mma time { return \%item }
    G017CP111_D   : type comma start_rate comma end_rate comma change 
+comma date comma time { \%item }
    G017RPAGO_N   : type comma rate comma whitespace  comma whitespace
+  comma date comma time { \%item }
    G017ONFD      : type comma rate comma rate comma rate comma rate c
+omma rate comma rate comma date comma time { \@item }
    G017PDFF      : type comma rate comma rate comma rate comma rate c
+omma date comma time { \@item }
) );

while ( chomp( my $quote_data = <DATA> ) ) {
    next if $quote_data !~ /\S/;
    my $quote_type = $parser->get_type( $quote_data );
    next if ! defined $quote_type;
    $quote_type =~ s/\W/_/g;
    print "* $quote_type : $quote_data *\n";
 
    if ( defined $quote_type ) {
        my $data = $parser->$quote_type( $quote_data ); # <-- this doe
+sn't work :(
        if ( defined $data ) {
            print Dumper $data;
        } else {
            print "\$data is undefined for $quote_type\n";
        }
    }
}

__DATA__
G017RATEBRKRL,4.2500,10/2/01,10/05/01,16:40:57
G017CP111 D,2.3800,2.3300,0.0001,10/05/01,16:40:55
G017RPAGO/N,2.4300, , ,10/05/01,16:40:58
G017ONFD,2.3125,2.3750,2.4375,2.3750,2.4375,2.2500,10/05/01,16:40:56
G017PDFF,2.5000,2.7500,2.2500,2.5000,10/05/01,16:40:56
[download]

The intent is to loop through the data, get the type (that worked fine), and then return a reference to the data structure. Eventually, I intend to provide handlers to automatically add the data to the database, depending upon which type is encountered.

Unfortunately, nothing is returning any data. What am I overlooking?

Another problem is more of a style issue (I think). I don't like all of those 'comma' rules in there. Is this how it's done in Parse::RecDescent or am I totally missing something?

I've been reading through a RecDescent tutorial, but don't seem to be able to parse more than simple data with this module. Further, I think that I'm probably taking the wrong approach to this, so any suggestions as to other approaches would be useful (though I'd prefer to stick with Parse::RecDescent as it would be very useful.

Cheers,
Ovid

Vote for paco!

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

In reply to A Slough of ParseRecDescent Woes by Ovid

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.